jbloomAus / SAELens

Training Sparse Autoencoders on Language Models
https://jbloomaus.github.io/SAELens/
MIT License
490 stars 127 forks source link

fix: normalize decoder bias in fold_norm_scaling_factor #355

Closed chanind closed 3 weeks ago

chanind commented 3 weeks ago

Description

This PR implements the fix specified in #354 and updates tests.

Fixes #354

Type of change

Please delete options that are not relevant.

Checklist:

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

curt-tigges commented 3 weeks ago

@chanind Do we know what the scale of the impact of this bug might have been?

chanind commented 3 weeks ago

Seems like it would only be an issue for SAEs trained with normalize_activations='expected_average_only_in', which isn't the default. I'm struggling to get tests to pass with this fix though, the final assertion fails with this fix

chanind commented 3 weeks ago

Looks like the issue was just with the way tests were scaling the output, should be fixed now.

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 65.85%. Comparing base (4c32de0) to head (7604d74). Report is 2 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #355 +/- ## ========================================== + Coverage 65.84% 65.85% +0.01% ========================================== Files 25 25 Lines 3288 3289 +1 Branches 421 421 ========================================== + Hits 2165 2166 +1 Misses 1009 1009 Partials 114 114 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

chanind commented 3 weeks ago

Merging as it's currently broken in prod, and seems pretty had to argue with. Thanks @tuomaso!