Closed hijohnnylin closed 1 month ago
looks like #305 is related, but not exactly the same. possibly can eliminate two problems with one fix?
@hijohnnylin I pushed changes that implement a get_sae_config()
according to https://github.com/jbloomAus/SAELens/issues/305#issuecomment-2384123586 and was about to open a PR to fix https://github.com/jbloomAus/SAELens/issues/305. Do you want me to open a PR so you can review it (to see if we can solve both issues at once, or at least not make it difficult to fix this issue), or just discuss here?
@anthonyduong9
This looks great! thank you!
I think it's highly likely a slight modification to your change will satisfy both issues. I've skimmed your changes (not a full review since I'm not aware of all the edge cases in play here). Opening a PR would be great, I'll leave my suggestion/comment here so it's not blocked on creating a PR.
Feedback below is based on my needs - it's possible that the other PR has different requirements! Also, I defer to @chanind and @jbloomAus for final review as I'm not super familiar with conventions in this repo.
get_sae_config
takes a release
and a sae_id
, instead of a model_info
dict. Reasons for this:
release
is the top level key of the pretrained_saes yaml file - it's more likely that a user will know the release id, than the user will know the repo_id
. and release
together with sae_id
form a unique identifier, so you don't need the additional argument folder_name
.model_info
is generic and people won't know what the format is without looking into the function. a dict of str, Any
leaves room for potential error/typos for specifying the keys repo_id
and conversion_func
.i hope that's helpful, apologies for missing your branch when i created this issue. thank you!
@hijohnnylin thanks! I looked at what I'd need to do to implement your proposal, and it looks like I'd need to make additional changes about as large as my original changes. Basically, I'd need to change the interface
and in all of the implementations, use release
and sae_id
to perform lookups on an instance of
which we can just call get_pretrained_saes_directory()
to get.
@chanind the proposal seems a little different from what you described in https://github.com/jbloomAus/SAELens/issues/305#issuecomment-2384123586, but I think we should implement it, as it abstracts as much detail away from the user as possible, and also simplifies the code. Please let me know if this sounds good to you, and if so, if you'd prefer I open one PR per issue or one PR to fix both issues.
Seems reasonable! We should be able to look up the full sae info from a release
and a repo_id
anyway, which can give use the folder name. These issues can both be fixed in the same PR IMO, they seem like they're both fixable by the same thing.
Proposal
I would like a simple
get_sae_config
that takes therelease
andid
(undersaes
), and returns the config for the SAE. This should not require downloading the whole SAE weights.Motivation
Use case: If someone is looking to get the config to answer a quick question (eg what was the width of that one SAE?), they shouldn't need to have any knowledge of these conversion functions, or be forced to load the weights into memory.
Currently,
PretrainedSaeLoader
loads four different "types" of SAEs (each with a different conversion function): default (no conversion func), connor_rob_hook_z, gemma_2, and dictionary_learning_1.To get just the config for each of them, I have to special case each of them, like so:
There are some problems with this: 1) Difficult to maintain - Requires updating my function every time there is a new conversion function. 2) Unnecessary loading weights - The
connor_rob_hook_z
anddictionary_learning_1
conversion types require downloading (and sometimes loading) the weights of the SAEs. I don't want to do this when all I want is the config.Pitch
A simple
get_sae_config
method that takes two arguments:release_id
andsae_id
(and optionally, force_download if we suspect the config has changed). It returns the config without downloading weights or loading them.Alternatives
The current alternative is to write the special cases as above.
Checklist