Saving and loading the DQN agent would not save/load four needed attributes:
self.t
self.optim_t
self._cumulative_steps
self.replay_buffer
This caused the agent to have different a performance when evaluated without killing the program vs when saving the agent, killing the program, resuming the program and loading the agent.
Fig 1 - Training without checkpoints (i.e. same program ran from start to finish)
Fig 2 - Training with checkpoint (i.e., program killed at every t steps and agents loaded from disk)
My proposed solution (working, but applied only to the DQN agent) was to add new save_snapshot and load_snapshot methods on the agent's class (without overwriting the original save and load methods, avoiding saving the replay buffer every time):
Saving and loading the DQN agent would not save/load four needed attributes:
This caused the agent to have different a performance when evaluated without killing the program vs when saving the agent, killing the program, resuming the program and loading the agent.
Fig 1 - Training without checkpoints (i.e. same program ran from start to finish)
Fig 2 - Training with checkpoint (i.e., program killed at every t steps and agents loaded from disk)
My proposed solution (working, but applied only to the DQN agent) was to add new save_snapshot and load_snapshot methods on the agent's class (without overwriting the original save and load methods, avoiding saving the replay buffer every time):
This change is working as intended, training is resumed properly after reloading the agent from disk:
Fig 3 - Training with checkpoint (New patch) (i.e., program killed at every t steps and agents loaded from disk)