How to reproduce jbloom/Gemma-2b-Residual-Stream-SAEs

jbloomAus / SAELens

Training Sparse Autoencoders on Language Models

https://jbloomaus.github.io/SAELens/

MIT License

193 stars 67 forks source link

How to reproduce jbloom/Gemma-2b-Residual-Stream-SAEs #191

Open zdaiot opened 1 week ago

zdaiot commented 1 week ago

Thank you for your excellent work. I'd like to know how to reproduce jbloom/Gemma-2b-Residual-Stream-SAEs

Can you open source the configuration file for training it?

ianand commented 1 week ago

Is https://huggingface.co/jbloom/Gemma-2b-Residual-Stream-SAEs/blob/main/gemma_2b_blocks.12.hook_resid_post_16384/cfg.json what you're looking for?

zdaiot commented 1 week ago

@ianand 。Hello, I would like to ask about the datasets used in training SAEs and those used in analyzing SAEs. Are there any particular considerations? Do they need to be the same?

For instance, why is HuggingFaceFW/fineweb used when training jbloom/Gemma-2b-Residual-Stream-SAEs, and apollo-research/roneneldan-TinyStories-tokenizer-gpt2 used when training tiny-stories-1L-21M? Skylion007/openwebtext is used in training gpt2-small-res-jb, but NeelNanda/pile-10k is used in analyzing gpt2-small-res-jb. Why is this the case?

jbloomAus commented 6 days ago

@zdaiot Usually we like to use the training data to train the SAE but we often don't know what that is. Pre-tokenized datasets are better and we have a utility for generating those now. It gets more complicated if you want to work with instruction fine tuned datasets and I have no special insights here.

jbloomAus commented 6 days ago

@zdaiot let me know if I can close this :)

zdaiot commented 5 days ago

@jbloomAus May I ask if the same dataset is needed when training the SAE and analyzing the SAE?