jbloomAus / SAELens

Training Sparse Autoencoders on Language Models
https://jbloomaus.github.io/SAELens/
MIT License
191 stars 67 forks source link

[Proposal] Add MLP transcoders #182

Open dtch1997 opened 2 weeks ago

dtch1997 commented 2 weeks ago

Proposal

Support training, loading, and inference of MLP transcoders.

Motivation

MLP transcoders were trained by Jacob Dunefsky and Philippe Chlenski and have been shown to be useful.

Pitch

Checklist

dtch1997 commented 2 weeks ago

I'm likely going to be rather hacky with the first pass of the implementation, and possibly duplicate a bunch of code in order to have maximum freedom to make changes. Refactoring can be done later.