jbloomAus / DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks
https://jbloomaus-decisiontransformerinterpretability-app-4edcnc.streamlit.app/
MIT License
62 stars 15 forks source link

Store model checkpoints during ppo training #27

Closed Felhof closed 1 year ago

Felhof commented 1 year ago

Closes #22 During training of the PPO agents a model checkpoint will regularly be stored and uploaded to wandb. The checkpoint contains the state_dict and the online config because the checkpoint is intended for training a decision transformer which will have the environment config anyway. The number of checkpoints can be set using a command line argument.

jbloomAus commented 1 year ago

Thanks Felix! I might have more questions about this later but will merge for now :)