jbloomAus / DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks
https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/
MIT License
62 stars 15 forks source link

Reimplement BabyAI Recurrent AC Model for Trajectory Generation #24

Closed jbloomAus closed 1 year ago

jbloomAus commented 1 year ago

After failing to generate a successful TransformerAC Model, I'm pivoting to generating a recurrent AC model based on the Baby Pytorch implementation (https://github.com/mila-iqia/babyai/blob/master/babyai/model.py)

The tasks involved are:

jbloomAus commented 1 year ago

Image

It's looking hard to get the model to get passed the "just always go up for EV 0.5" phase, but I'll try training for longer.

Also, new tasks to be done include:

jbloomAus commented 1 year ago

After sleeping on it. I've decided to split those other items into new cards and finish this card by testing LSTM PPO on probe environments and a new probe environment designed to work on memory envs.

jbloomAus commented 1 year ago

splitting last item into own card, setting this as completed. Seems like a model trained overnight does train. I think there's still some chance the code isn't perfect so will make another attempt at getting the original BabyAI codebase running to do a comparison.

Image