nd-ball / py-irt

Bayesian IRT models in Python
MIT License
124 stars 44 forks source link

Fix randomness caused by set() #44

Closed EntilZha closed 1 year ago

EntilZha commented 1 year ago

Stack created with [Sapling]

Fix randomness caused by set()

Summary: In addition to seed controlled randomness, the data loader uses set() which does not guarantee order (unlike dict() past python 3.6). This made it so that order of subject/item ids is not deterministic, which makes it difficult to have exactly replicable results. Fortunately, the fix is easy. To fix this and make reproducibility easier, this PR:

Test Plan: Run this several times and ensure the loss/parameter values stay the same

py-irt train --seed 42 1pl examples/minitest.jsonlines --epochs 3 --log-every 1 ~/data/py-irt/minitest

Run this several times to make sure different seeds result in different output

py-irt train 1pl examples/minitest.jsonlines --epochs 3 --log-every 1 ~/data/py-irt/minitest