Closed georgeyw closed 2 months ago
leaving a note as a reminder that this might also require changing the automatic temp setting in LLC estimation
Added a basic grad accum LLC sanity check, where we run LLC estimation on a 2d and 4d normal crossing for 1, 4, 16, and 64 grad accum steps (for 64, 16, 4, and 1 steps respectively). These are within 1e-6 of each other, which I think is good enough.
One comment, looks good otherwise. Holding off on merging until https://github.com/timaeus-research/devinterp/pull/74 is merged (as this includes that)
Tested here on an earlier branch: https://wandb.ai/devinterp/llc-scaling
Copied from my notes when I tested this earlier:
(Still should re-do testing on current main to make sure that I did the rebase correctly)