add entries to Llama Scope SAEs in pretrained_saes.yaml
implement Llama Scope loaders in pretrained_sae_loaders.py
implement an option to not replacing bos activations when computing ce_loss_score in eval.py. This is necessary for SAEs trained with context starting with bos but is not trained with bos activations.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
[ ] New feature (non-breaking change which adds functionality)
[ ] This change requires a documentation update
Checklist:
[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[ ] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
[x] I have not rewritten tests relating to key interfaces which would affect backward compatibility
You have tested formatting, typing and unit tests (acceptance tests not currently in use)
[x] I have run make check-ci to check format and linting. (you can run make format to format code if needed.)
Performance Check.
If you have implemented a training change, please indicate precisely how performance changes with respect to the following metrics:
[ ] L0
[ ] CE Loss
[ ] MSE Loss
[ ] Feature Dashboard Interpretability
Please links to wandb dashboards with a control and test group.
Description
This pull request includes the following updates:
Fixes # (issue)
Type of change
Please delete options that are not relevant.
Checklist:
You have tested formatting, typing and unit tests (acceptance tests not currently in use)
make check-ci
to check format and linting. (you can runmake format
to format code if needed.)Performance Check.
If you have implemented a training change, please indicate precisely how performance changes with respect to the following metrics:
Please links to wandb dashboards with a control and test group.