timaeus-research / devinterp

Tools for studying developmental interpretability in neural networks.
74 stars 14 forks source link

Seeding init_loss calculations #94

Closed wz-ml closed 1 month ago

wz-ml commented 2 months ago

There's a small logical error that's been causing consecutive identical sampling calls with multiple chains and fixed seeds to have different LLCs (diverging by 1 part in ~500). We currently calculate init_loss by sampling batches from the dataloader (which we do before setting the seed), which in the case of shuffle=True causes the sampled batches to diverge. We probably overlooked this in the past because the act of setting the seed afterward causes the init_loss of later runs to be seeded upon initialization.

wz-ml commented 1 month ago

Fixed in the new DevInterp release.