Closed zhihaocheng closed 3 years ago
Hi @zhihaocheng , thanks for the report.
I reproduced the problem: SAC doesn't learn on Hopper-v2 and Walker2d-v2. I don't think this problem is related to the hyper-parameters because it works on HalfCheetah-v2.
In the mean time, can you try with TD3? I confirmed TD3 works fine on both the two environments.
Hello @keiohta , thanks for your response.
I am trying to running TD3 and hope to work.
But it is still strange that SAC could not work on Hopper-v2 and Walker2d-v2, because it works well in other implementations. For example, https://spinningup.openai.com/en/latest/spinningup/bench.html. During my using of TF2RL, I found that both GAIL and SAC could not work on Hopper-v2 even though Hopper-v2 is almost the simplest continuous task. What's worse, GAIL uses DDPG, which is quite similar to SAC. So I guess whether there is something improper in terms of the implementation details of DDPG and SAC such that they could not work on Hopper-v2.
It would be perfect if the implementation here is able to work across various environments. On the other hand, I am very curious about why it does not work, because I did not find anything improper in the implementation.
I found a bug that caused the poor performance and fixed on this commit. I confirmed SAC works as expected in this fixed version.
However, it has another error by introducing TensorFlow probability to remove custom distribution classes. So, I'll let you know once I merge this branch.
P.S. Yeah, it would be good if I can guarantee that the performance of current implementation is similar to the one in the paper. I'll work on this.
I merged the branch. Can you try with the latest master?
pip install tf2rl==0.1.19
# or
pip install -U tf2rl
Thanks @keiohta, I will try the new version today. Also, I have checked your new commit and I think the reason that the past version did not work is due to the shape error during training? Is my understanding right?
Hello @keiohta , I tried but failed. I guess the newest TF2RL needs specific version of Tensorflow.
OS: CentOS Linux release 7.2 (Final).
Tensorflow: tensorflow-cpu 2.3.0.
tf2rl: 0.1.19.
Full error:
Traceback (most recent call last):
File "examples/run_sac.py", line 3, in
Hi @zhihaocheng , this is caused by incompatibility in TensorFlow Probability version. Please refer releases of tfp and install supported tfp version. I guess it will be 0.11.0.
Thank you @keiohta , after updating Tf Probalibity to 0.11.0, SAC is able to run and it will take some time to get the training results.
Hi @keiohta , I have tested the SAC here, and it do work on Hopper-v2.
Hello keiohta, I found that GAIL could not work in Hopper-v2 or Walker2d-v2. And SAC in this repository could not train a successful policy for Hopper-v2 too. I have checked the implementation of GAIL but failed to find out the reason why it does not work. In this failure caused by hyper-parameters or whether I need to fine-tune the parameters before training?