Open RamiRibat opened 3 years ago
Also, regarding line 11, for _how long should Modelpool expand? Bc it occupies the GPU's memory as it grows.
For line 6, I think it stops training until validation loss converges. I have implemented a pytorch version myself (https://github.com/jiangsy/mbpo_pytorch/tree/master/mbpo_pytorch), and you may view it as a reference (there are some still gaps in performance but may still provide some help).
Thank you very much.. @jiangsy
Here is a pytorch implementation that achieves the same performance on walker and hopper: https://github.com/Xingyu-Lin/mbpo_pytorch. Other tasks not tested.
Hi, This is really a nice work,
I've faced some issues related to TensorFlow and CUDA, and I'm not that good with TensorFlow, I'm a Pytorch guy.
So I've decided to make a Pytorch implementation for MBPO, and I'm trying to understand your code..
From my understanding: Taking AntTruncatedObs-v2 as a working example,
Pytorch Pceucode:
Total epochs = 1000 Epoch steps = 1000 Exploration epochs = 10
Is that right?
My questions are about lines 06 & 11:
06: You're using some real time period to train the model.. in terms of gradients steps, How many steps they're? 11: When you reallocate the Model_pool, you set the [Model_pool size] to the number of [model steps per epoch], But.. Isn't that a really huge training set for SAC updates? Are you disgarding all Model steps from previous epochs?
Sorry for this very big issue..
Best wishes and kind regards.
Rami Ahmed