There's a small logical error that's been causing consecutive identical sampling calls with multiple chains and fixed seeds to have different LLCs (diverging by 1 part in ~500). We currently calculate init_loss by sampling batches from the dataloader (which we do before setting the seed), which in the case of shuffle=True causes the sampled batches to diverge. We probably overlooked this in the past because the act of setting the seed afterward causes the init_loss of later runs to be seeded upon initialization.
There's a small logical error that's been causing consecutive identical sampling calls with multiple chains and fixed seeds to have different LLCs (diverging by 1 part in ~500). We currently calculate init_loss by sampling batches from the dataloader (which we do before setting the seed), which in the case of
shuffle=True
causes the sampled batches to diverge. We probably overlooked this in the past because the act of setting the seed afterward causes the init_loss of later runs to be seeded upon initialization.