jbloomAus / DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks
https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/
MIT License
58 stars 15 forks source link

Make it possible to track the preferences of the PPO in the app. #70

Open jbloomAus opened 1 year ago

jbloomAus commented 1 year ago

https://docs.google.com/document/d/1N1lVOXS5bLKYiXfoEeQoxxtI_0EfROi-JXcs-eYTCSA/edit?usp=sharing

I think this could be very valuable form the perspective of measuring the agent-simulators proclivity for modelling different agents in it's training distribution.

jbloomAus commented 1 year ago

A better version of this might be write a script which takes the training data and tests the predictions of the RL policies vs the agent simulator. We can think closely investigate examples with significant divergence and investigate the underlying mechanisms.