jbloomAus / DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks
https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/
MIT License
73 stars 16 forks source link

SVD Decomp / Explore ways to use dimensionality reduction to quickly understand what heads are doing. #69

Open jbloomAus opened 1 year ago

jbloomAus commented 1 year ago

This post is awesome. I think the value from using this method comes from both understanding the method better, understanding our models better and the editing could be cool too.

"highly interpretable semantic clusters" sound very cool. "Directly editing SVD representations" sounds very cool too.

Steps:

jbloomAus commented 1 year ago

I've done some of these but I'm not sure if it's working. It's really hard to tell.