Questions

Thank u for your insightful work on SAEs. Your innovative approach to leveraging the residual stream for feature extraction has been both thought-provoking and inspiring.

As someone currently engaged in related research, I am particularly interested in understanding the practical aspects of your methodology. Specifically, I am curious about the process you employed for feature annotation within your models. Given the scale of features—such as the 24,576 features you extracted at each layer in a model like GPT-2—could you clarify how these features are annotated? Are these features manually labeled, or is there an automated or semi-automated method used to manage this extensive annotation process?

Your insights on this matter would greatly aid my understanding and approach to similar challenges in my research.

Thank you for your time and contributions to the field.

jbloomAus / SAELens

[Question] Clarification on Feature Annotation in SAEs #248

Questions