jbloomAus / SAELens

Training Sparse Autoencoders on Language Models
https://jbloomaus.github.io/SAELens/
MIT License
506 stars 129 forks source link

fix: load the same config from_pretrained and get_sae_config #361

Closed chanind closed 3 weeks ago

chanind commented 4 weeks ago

Description

This PR unifies the code path for from_pretrained and get_sae_config so the config dict should match, and adds a test asserting this for a jbloom/gpt2 SAE.

Fixes #351

Type of change

Please delete options that are not relevant.

Checklist:

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

codecov[bot] commented 4 weeks ago

Codecov Report

Attention: Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 66.14%. Comparing base (b8703fe) to head (18143b7). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sae_lens/toolkit/pretrained_sae_loaders.py 80.00% 1 Missing and 2 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #361 +/- ## ========================================== - Coverage 66.17% 66.14% -0.03% ========================================== Files 25 25 Lines 3367 3373 +6 Branches 429 432 +3 ========================================== + Hits 2228 2231 +3 - Misses 1020 1021 +1 - Partials 119 121 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

chanind commented 3 weeks ago

I noticed that as well - it looks like the neuronpedia_id property is only set on the config in the SAE, but not in the config_dict that gets returned from SAE.from_pretrained(). I'm not sure what the reason for that is, or if it's simply an oversight. I can add it to the the config_dict so it's set every if that makes more sense. I'm also not sure what the idea with the neuronpedia property is

chanind commented 3 weeks ago

I updated the PR to include neuronpedia_id in the config in both places. I think the neuronpedia properly was likely just a typo, since there's no equivalent property on the SAEConfig class.