Hzfinfdu commented 2 weeks ago

Description

This pull request includes the following updates:

add entries to Llama Scope SAEs in pretrained_saes.yaml
implement Llama Scope loaders in pretrained_sae_loaders.py
implement an option to not replacing bos activations when computing ce_loss_score in eval.py. This is necessary for SAEs trained with context starting with bos but is not trained with bos activations.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[ ] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
[x] I have not rewritten tests relating to key interfaces which would affect backward compatibility

[x] I have run make check-ci to check format and linting. (you can run make format to format code if needed.)

If you have implemented a training change, please indicate precisely how performance changes with respect to the following metrics:

Please links to wandb dashboards with a control and test group.

chanind commented 2 weeks ago

This PR has typing errors which are failing both here and now in main

Hzfinfdu commented 2 weeks ago

Sry for that. Let me check this.

chanind commented 2 weeks ago

It's OK, we just shouldn't merge PRs until CI passes in future