Reimplement BabyAI Recurrent AC Model for Trajectory Generation

jbloomAus / DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks

https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/

MIT License

62 stars 15 forks source link

Reimplement BabyAI Recurrent AC Model for Trajectory Generation #24

Closed jbloomAus closed 1 year ago

jbloomAus commented 1 year ago

After failing to generate a successful TransformerAC Model, I'm pivoting to generating a recurrent AC model based on the Baby Pytorch implementation (https://github.com/mila-iqia/babyai/blob/master/babyai/model.py)

The tasks involved are:

[x] Port over the model and get it running using our PPO methods, rewrite any if needed
[x] Write basic tests to ensure it works
[ ] Ensure we can generate trajectories from it that we can train our decision transformers on

jbloomAus commented 1 year ago

It's looking hard to get the model to get passed the "just always go up for EV 0.5" phase, but I'll try training for longer.

Also, new tasks to be done include:

[ ] changing the view size for the BOW model which the PPO LSTM model currently works off
[ ] enabling the observations to be recorded as one hot or otherwise convert them later if required.

jbloomAus commented 1 year ago

After sleeping on it. I've decided to split those other items into new cards and finish this card by testing LSTM PPO on probe environments and a new probe environment designed to work on memory envs.

jbloomAus commented 1 year ago

splitting last item into own card, setting this as completed. Seems like a model trained overnight does train. I think there's still some chance the code isn't perfect so will make another attempt at getting the original BabyAI codebase running to do a comparison.