[Feature Request / Suggestion]: Is it possible to extend this to text embeddings?

stanfordnlp / pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions

http://pyvene.ai

Apache License 2.0

604 stars 56 forks source link

Open fblissjr opened 1 month ago

fblissjr commented 1 month ago

Been curious for awhile now, then moreso since reading Disentangling Dense Embeddings with Sparse Autoencoders (https://arxiv.org/html/2408.00657v2)

It looks like most of the ingredients in pyvene are here to to do this with text embeddings?

frankaging commented 1 month ago

@fblissjr Yes! We recently released one SAE tutorials on hidden layers, not the embedding layers. But if you specify the component to be the embedding layer output, you could essentially replicate the results in this paper IIUC: https://github.com/stanfordnlp/pyvene/blob/main/tutorials/basic_tutorials/Sparse_Autoencoder.ipynb