Closed jbloomAus closed 1 month ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 56.35%. Comparing base (
0550ae3
) to head (3d02279
). Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Description
Implements a simple SAE method for folding W_dec norm weights of Anthropic style SAEs into encoder such that W_dec features are unit norm. See Anthropic update here: https://transformer-circuits.pub/2024/april-update/index.html#training-saes
I have tested that the feature activations and sae out are as expected. It's possible we should make this the default when loading from pretrained (so that feature activations are conceptually what you expect them to be). I may submit a PR soon which makes this the case.
Type of change
Please delete options that are not relevant.
Checklist: