SAC policy produces nan action

mtaohuang commented 4 years ago

[ ] I have marked all applicable categories:
- [x] exception-raising bug
- [ ] RL algorithm bug
- [ ] documentation request (i.e. "X is missing from the documentation.")
- [ ] new feature request
[x] I have visited the source website, and in particular read the known issues
[x] I have searched through the issue categories for duplicates

[x] I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, torch, sys
print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

version numbers: 0.2.2 1.4.0
problem: sac policy generating nan actions running python3 examples/halfcheetahBullet_v0_sac.py --task BipedalWalkerHardcore-v3, cannot pass nan assertion, and causing env exception.

Trinkle23897 commented 4 years ago

The scripts under test are updated. I previously found that the logstd in Gaussian policy should not be conditioned on the input, otherwise it would cause unstable trainings. Maybe you can have a try first? The scripts under examples will be maintained after the NeurIPS deadline :)

NWPU-SSZ commented 4 years ago

I have met the same problem when I was training in the myself env which was stable in other method.

Trinkle23897 commented 4 years ago

I have met the same problem when I was training in myself env which was stable in other methods.

Have you tried the current github version?

NWPU-SSZ commented 4 years ago

Sorry I had updated the version and finded the problem have solved just now. Thank you.

thu-ml / tianshou

SAC policy produces nan action #44