jbloomAus / DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks
https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/
MIT License
61 stars 15 forks source link

Add static interpretability visualizations to wandb dashboard. #60

Open jbloomAus opened 1 year ago

jbloomAus commented 1 year ago

Add static interpretability visualizations to wandb dashboard. Seems like a cool idea I just had.

Some stuff: QK/OV circuit viz, different attribution embeddings, time embedding, L2 Norms of different components.