stanfordnlp / pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
http://pyvene.ai
Apache License 2.0
604 stars 56 forks source link

[Feature Request / Suggestion]: Is it possible to extend this to text embeddings? #181

Open fblissjr opened 1 month ago

fblissjr commented 1 month ago

Suggestion / Feature Request

Been curious for awhile now, then moreso since reading Disentangling Dense Embeddings with Sparse Autoencoders (https://arxiv.org/html/2408.00657v2)

It looks like most of the ingredients in pyvene are here to to do this with text embeddings?

frankaging commented 1 month ago

@fblissjr Yes! We recently released one SAE tutorials on hidden layers, not the embedding layers. But if you specify the component to be the embedding layer output, you could essentially replicate the results in this paper IIUC: https://github.com/stanfordnlp/pyvene/blob/main/tutorials/basic_tutorials/Sparse_Autoencoder.ipynb