[DRAFT] Add gradient accumulation for LLC estimation

georgeyw commented 6 months ago

Tested here on an earlier branch: https://wandb.ai/devinterp/llc-scaling

Copied from my notes when I tested this earlier:

Too lazy to copy and paste every value, but tldr this is 5 seeds, mean and std of LLC estimation on the 3m param 2 layer LM on 5 seeds.

If you compare grad accum to regular LLC estimation, they’re almost identical on the same seed, with some minor differences that are probably like floating point rounding errors or something (~0.03 diff at most with llcs around 150). Similar with the std. Pretty good for something that involves accumulated errors over the sample draws

(Still should re-do testing on current main to make sure that I did the rebase correctly)

georgeyw commented 6 months ago

leaving a note as a reminder that this might also require changing the automatic temp setting in LLC estimation

svwingerden commented 3 months ago

Added a basic grad accum LLC sanity check, where we run LLC estimation on a 2d and 4d normal crossing for 1, 4, 16, and 64 grad accum steps (for 64, 16, 4, and 1 steps respectively). These are within 1e-6 of each other, which I think is good enough.

svwingerden commented 3 months ago

One comment, looks good otherwise. Holding off on merging until https://github.com/timaeus-research/devinterp/pull/74 is merged (as this includes that)

timaeus-research / devinterp

[DRAFT] Add gradient accumulation for LLC estimation #76