jettjaniak / teren

Linking activation space features to model behavior
Apache License 2.0
0 stars 1 forks source link

save SAEFeatureExamples++ to persistent storage #23

Open jettjaniak opened 2 months ago

jettjaniak commented 2 months ago

This is the only element of the experiment that gains much from executing at all layers at once. If we precompute it and save in persistent storage, and then load it into memory one by one, then we will be able to store clean_logits in it as well.

jettjaniak commented 2 months ago

What I think we should actually do:

  1. tokenize dataset {seq len}
  2. run a model on a sizable dataset, save all resid activations, logits and clean loss; upload to HF {model, tokenized dataset, seq len)
  3. decompose activations with SAE, for each feature save (batch, pos) pairs where the feature was active + max feature activation {SAE}

object/step 1: tokenized dataset parameters: source dataset name, dataset split, tokenizer, seq. len

object/step 2: resid acts, logits and loss

object/step 3: active feature locations meaning: for each feature, have a list of (batch, seq) pairs where it was active +max activation

we can take it for a test run by just using a smal subset of the dataset