nicklashansen / tdmpc2

Code for "TD-MPC2: Scalable, Robust World Models for Continuous Control"
https://www.tdmpc2.com
MIT License
343 stars 71 forks source link

About continue training from a given checkpoint #28

Closed my-rice closed 4 months ago

my-rice commented 5 months ago

In order to continue the training from a checkpoint, I created two functions to load and save the model, the optimizer and the pi_optimizer. This way I can continue the training from a given checkpoint. Unfortunately the training is only stable for the first 10000/12000 iterations and then the rewards become very unstable. The rewards after 20000/25000 iterations tend to drop to really low numbers compared to the rewards I was getting before I stopped training.

Do you have any idea why this happens?

I noticed that agent.update() uses the buffer during training, but I did not save it so it is re-instantiated from scratch. Is this a major problem? What are the consequences of using a new buffer when I continue the training?

nicklashansen commented 5 months ago

Yes, I think this is somewhat expected behavior. Resuming training without storing the dataset usually causes a drop in performance before it bounces back. Saving and loading the buffer should allow you to fully resume a training run.

my-rice commented 5 months ago

Thank you, the solution of saving the buffer, together with the model and optimisers, to resume the training is working well. I am now able to stop and resume the training without problems. Unfortunately, without the correct buffer I don't think it is possible to resume the training in a stable manner.

nicklashansen commented 4 months ago

Sounds great! Closing this issue but feel free to re-open if you have any follow-up questions.