Description

In #365, we fixed a bug where the scale_sparsity_penalty_by_decoder_norm was being ignored and the SAE was always scaling by decoder norm regardless. However, this fix revealed a second bug where we're not passing the scale_sparsity_penalty_by_decoder_norm param through to the training SAE at all. This sort of bug is easy to happen given that we create the TrainingSAEConfig from the runner config via creating a dictionary without type checking.

This PR adds a test just that scale_sparsity_penalty_by_decoder_norm is now being passed through correctly to get this fix out asap, but I'll make a follow-up PR with a more robust fix in the form of better tests or type checking or something after this is merged.

Type of change

Please delete options that are not relevant.

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update

Checklist:

[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
[x] I have not rewritten tests relating to key interfaces which would affect backward compatibility

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

[x] I have run make check-ci to check format and linting. (you can run make format to format code if needed.)

jbloomAus / SAELens

fix: hotfix scale decoder norm is not passed to training sae #377

Description

Type of change

Checklist:

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

Codecov Report