keiohta / tf2rl

TensorFlow2 Reinforcement Learning
MIT License
461 stars 104 forks source link

GAIL could not work on Hopper-v2 #109

Closed zhihaocheng closed 3 years ago

zhihaocheng commented 3 years ago

Hello keiohta, I found that GAIL could not work in Hopper-v2 or Walker2d-v2. And SAC in this repository could not train a successful policy for Hopper-v2 too. I have checked the implementation of GAIL but failed to find out the reason why it does not work. In this failure caused by hyper-parameters or whether I need to fine-tune the parameters before training?

keiohta commented 3 years ago

Hi @zhihaocheng , thanks for the report.

I reproduced the problem: SAC doesn't learn on Hopper-v2 and Walker2d-v2. I don't think this problem is related to the hyper-parameters because it works on HalfCheetah-v2.

In the mean time, can you try with TD3? I confirmed TD3 works fine on both the two environments.

zhihaocheng commented 3 years ago

Hello @keiohta , thanks for your response.

I am trying to running TD3 and hope to work.

But it is still strange that SAC could not work on Hopper-v2 and Walker2d-v2, because it works well in other implementations. For example, https://spinningup.openai.com/en/latest/spinningup/bench.html. During my using of TF2RL, I found that both GAIL and SAC could not work on Hopper-v2 even though Hopper-v2 is almost the simplest continuous task. What's worse, GAIL uses DDPG, which is quite similar to SAC. So I guess whether there is something improper in terms of the implementation details of DDPG and SAC such that they could not work on Hopper-v2.

zhihaocheng commented 3 years ago

It would be perfect if the implementation here is able to work across various environments. On the other hand, I am very curious about why it does not work, because I did not find anything improper in the implementation.

keiohta commented 3 years ago

I found a bug that caused the poor performance and fixed on this commit. I confirmed SAC works as expected in this fixed version.

However, it has another error by introducing TensorFlow probability to remove custom distribution classes. So, I'll let you know once I merge this branch.

P.S. Yeah, it would be good if I can guarantee that the performance of current implementation is similar to the one in the paper. I'll work on this.

keiohta commented 3 years ago

I merged the branch. Can you try with the latest master?

pip install tf2rl==0.1.19
# or
pip install -U tf2rl
zhihaocheng commented 3 years ago

Thanks @keiohta, I will try the new version today. Also, I have checked your new commit and I think the reason that the past version did not work is due to the shape error during training? Is my understanding right?

zhihaocheng commented 3 years ago

Hello @keiohta , I tried but failed. I guess the newest TF2RL needs specific version of Tensorflow.

OS: CentOS Linux release 7.2 (Final). Tensorflow: tensorflow-cpu 2.3.0. tf2rl: 0.1.19. Full error: Traceback (most recent call last): File "examples/run_sac.py", line 3, in from tf2rl.algos.sac import SAC File "/home/zhc/anaconda3/envs/dads-env/lib/python3.6/site-packages/tf2rl/algos/sac.py", line 7, in from tf2rl.policies.tfp_gaussian_actor import GaussianActor File "/home/zhc/anaconda3/envs/dads-env/lib/python3.6/site-packages/tf2rl/policies/tfp_gaussian_actor.py", line 3, in import tensorflow_probability as tfp File "/home/zhc/anaconda3/envs/dads-env/lib/python3.6/site-packages/tensorflow_probability/init.py", line 75, in from tensorflow_probability.python import * # pylint: disable=wildcard-import File "/home/zhc/anaconda3/envs/dads-env/lib/python3.6/site-packages/tensorflow_probability/python/init.py", line 24, in from tensorflow_probability.python import experimental File "/home/zhc/anaconda3/envs/dads-env/lib/python3.6/site-packages/tensorflow_probability/python/experimental/init.py", line 34, in from tensorflow_probability.python.experimental import auto_batching File "/home/zhc/anaconda3/envs/dads-env/lib/python3.6/site-packages/tensorflow_probability/python/experimental/auto_batching/init.py", line 24, in from tensorflow_probability.python.experimental.auto_batching import frontend File "/home/zhc/anaconda3/envs/dads-env/lib/python3.6/site-packages/tensorflow_probability/python/experimental/auto_batching/frontend.py", line 44, in from tensorflow.python.autograph.core import naming ImportError: cannot import name 'naming'

keiohta commented 3 years ago

Hi @zhihaocheng , this is caused by incompatibility in TensorFlow Probability version. Please refer releases of tfp and install supported tfp version. I guess it will be 0.11.0.

zhihaocheng commented 3 years ago

Thank you @keiohta , after updating Tf Probalibity to 0.11.0, SAC is able to run and it will take some time to get the training results.

zhihaocheng commented 3 years ago

Hi @keiohta , I have tested the SAC here, and it do work on Hopper-v2.