Closes #22
During training of the PPO agents a model checkpoint will regularly be stored and uploaded to wandb. The checkpoint contains the state_dict and the online config because the checkpoint is intended for training a decision transformer which will have the environment config anyway. The number of checkpoints can be set using a command line argument.
Closes #22 During training of the PPO agents a model checkpoint will regularly be stored and uploaded to wandb. The checkpoint contains the state_dict and the online config because the checkpoint is intended for training a decision transformer which will have the environment config anyway. The number of checkpoints can be set using a command line argument.