Closed aryamanarora closed 1 day ago
@explanare has this locally setup I think.
What functionality should this PR have for best integration with existing pyvene tools?
I have an implementation of SAE training working with transformerlens models (largely inspired by Neel Nanda’s code but adjusted such that experiments are easier to modify). Currently the Buffer class collects the activations from a layer with the model.run_with_cache
function, I presume there is an equivalent pyvene function that should be used instead? Other than that I think all the other code should currently be interoperable with pyvene.
@smejak currently, pyvene supports activation collection as in the first two examples in https://github.com/stanfordnlp/pyvene/blob/main/pyvene_101.ipynb. Will this help?
We should add support for training sparse autoencoders (Bricken et al., 2023, Cunningham et al., 2023). Cool be cool as a way of obtaining a feature basis for interventions.