Open jettjaniak opened 4 months ago
What I think we should actually do:
object/step 1: tokenized dataset parameters: source dataset name, dataset split, tokenizer, seq. len
object/step 2: resid acts, logits and loss
object/step 3: active feature locations meaning: for each feature, have a list of (batch, seq) pairs where it was active +max activation
we can take it for a test run by just using a smal subset of the dataset
This is the only element of the experiment that gains much from executing at all layers at once. If we precompute it and save in persistent storage, and then load it into memory one by one, then we will be able to store clean_logits in it as well.