jbloomAus / SAELens

Training Sparse Autoencoders on Language Models
https://jbloomaus.github.io/SAELens/
MIT License
454 stars 120 forks source link

[Question] Clarification on Feature Annotation in SAEs #248

Closed danjuan-77 closed 3 months ago

danjuan-77 commented 3 months ago

Questions

Thank u for your insightful work on SAEs. Your innovative approach to leveraging the residual stream for feature extraction has been both thought-provoking and inspiring.

As someone currently engaged in related research, I am particularly interested in understanding the practical aspects of your methodology. Specifically, I am curious about the process you employed for feature annotation within your models. Given the scale of features—such as the 24,576 features you extracted at each layer in a model like GPT-2—could you clarify how these features are annotated? Are these features manually labeled, or is there an automated or semi-automated method used to manage this extensive annotation process?

Your insights on this matter would greatly aid my understanding and approach to similar challenges in my research.

Thank you for your time and contributions to the field.

jbloomAus commented 3 months ago

Good luck!