thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.
https://tianshou.org
MIT License
7.75k stars 1.12k forks source link

SAC policy produces nan action #44

Closed mtaohuang closed 4 years ago

mtaohuang commented 4 years ago
Trinkle23897 commented 4 years ago

The scripts under test are updated. I previously found that the logstd in Gaussian policy should not be conditioned on the input, otherwise it would cause unstable trainings. Maybe you can have a try first? The scripts under examples will be maintained after the NeurIPS deadline :)

NWPU-SSZ commented 4 years ago

I have met the same problem when I was training in the myself env which was stable in other method.

Trinkle23897 commented 4 years ago

I have met the same problem when I was training in myself env which was stable in other methods.

Have you tried the current github version?

NWPU-SSZ commented 4 years ago

Sorry I had updated the version and finded the problem have solved just now. Thank you.