jbloomAus / DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks
https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/
MIT License
68 stars 16 forks source link

Explore Improvements to DT Training Procedure #53

Closed jbloomAus closed 1 year ago

jbloomAus commented 1 year ago

/Just wanted to have meta card to track progress on these things with links:

If we've implemented all of those and still no success with the memory env training, possibly try either much longer training runs, more variable sampling methods, or ask for advice (or go bug hunting).

jbloomAus commented 1 year ago

LN and Adam done. No clear benefit on smaller model. I think I'll get everything implemented then set off some sweeps tomorrow/the next day with the memory env environment.

jbloomAus commented 1 year ago

done lr scheduling stuff: https://github.com/users/jbloomAus/projects/1/views/1?pane=issue&itemId=27012682

jbloomAus commented 1 year ago

I'm going to add a task here for setting up wandb sweeps. I think given the stuff I've added, it's important to just get a better sense of the right hyperparameters I need.

jbloomAus commented 1 year ago

I just had a lightbulb moment relating to #61 so I'm going to do that really quick before I attempt wandb sweeps.

jbloomAus commented 1 year ago

converting "Implement masking rather than just having different tokens during padding" to it's own card.

jbloomAus commented 1 year ago

Closing this. Got working agents!