[Proposal] Add MLP transcoders

jbloomAus / SAELens

Training Sparse Autoencoders on Language Models

https://jbloomaus.github.io/SAELens/

MIT License

191 stars 67 forks source link

[Proposal] Add MLP transcoders #182

Open dtch1997 opened 2 weeks ago

dtch1997 commented 2 weeks ago

Proposal

Support training, loading, and inference of MLP transcoders.

Motivation

MLP transcoders were trained by Jacob Dunefsky and Philippe Chlenski and have been shown to be useful.

Pitch

Implement a HookedTranscoder class analogous to HookedSAE and using similar functionality.
Implement a transcoder training runner.
Support loading pre-trained transcoder checkpoints.

Checklist

[X] I have checked that there is no similar issue in the repo (required)

dtch1997 commented 2 weeks ago

I'm likely going to be rather hacky with the first pass of the implementation, and possibly duplicate a bunch of code in order to have maximum freedom to make changes. Refactoring can be done later.