Summary:
In addition to seed controlled randomness, the data loader uses set() which does not guarantee order (unlike dict() past python 3.6).
This made it so that order of subject/item ids is not deterministic, which makes it difficult to have exactly replicable results.
Fortunately, the fix is easy. To fix this and make reproducibility easier, this PR:
Changes to an ordered set implementation
Adds CLI args to control seeds and other determinisim
Test Plan:
Run this several times and ensure the loss/parameter values stay the same
Stack created with [Sapling]
Fix randomness caused by set()
Summary: In addition to seed controlled randomness, the data loader uses set() which does not guarantee order (unlike dict() past python 3.6). This made it so that order of subject/item ids is not deterministic, which makes it difficult to have exactly replicable results. Fortunately, the fix is easy. To fix this and make reproducibility easier, this PR:
Test Plan: Run this several times and ensure the loss/parameter values stay the same
Run this several times to make sure different seeds result in different output