openai / automated-interpretability

940 stars 110 forks source link

Getting Top Activating Text Excerpts Per Neuron #20

Closed baselmousi closed 1 year ago

baselmousi commented 1 year ago

Hello,

Can you please clarify how do you get the top activating text excerpts per neuron? Do you average the activation values for all tokens in the text excerpt or do you sum them up?

williamrs-openai commented 1 year ago

We take maximum over all activations in each text excerpt, then take the text excerpts with the highest maximum activation value.