Closed my-rice closed 4 months ago
Yes, I think this is somewhat expected behavior. Resuming training without storing the dataset usually causes a drop in performance before it bounces back. Saving and loading the buffer should allow you to fully resume a training run.
Thank you, the solution of saving the buffer, together with the model and optimisers, to resume the training is working well. I am now able to stop and resume the training without problems. Unfortunately, without the correct buffer I don't think it is possible to resume the training in a stable manner.
Sounds great! Closing this issue but feel free to re-open if you have any follow-up questions.
In order to continue the training from a checkpoint, I created two functions to load and save the model, the optimizer and the pi_optimizer. This way I can continue the training from a given checkpoint. Unfortunately the training is only stable for the first 10000/12000 iterations and then the rewards become very unstable. The rewards after 20000/25000 iterations tend to drop to really low numbers compared to the rewards I was getting before I stopped training.
Do you have any idea why this happens?
I noticed that agent.update() uses the buffer during training, but I did not save it so it is re-instantiated from scratch. Is this a major problem? What are the consequences of using a new buffer when I continue the training?