Closed jdhorwood closed 2 years ago
As an added comment, it seems like there might be other issues surrounding entropy. For the problem I am working on, attempting to tune the entropy coefficient changes nothing, regardless of the coefficient value. When using Pytorch's A2C, the above crash does not occur, but optimization is identical whether entropy_coeff is set to 0.01 or 10.
I'm not sure where this is in the code exactly, but my intuition for both these issues is the following:
This could be explained by entropy_coeff multiplying the logits prior to the policy's softmax, leading to nan's when these logits contain tf.float32.min, and further causing the above error message.
Using the entropy_coeff at this stage and returning the resulting value as the policy's entropy would additionally result in identical values regardless of entropy_coeff, which could explain the second observation.
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
Hi, did anybody get a chance to look at this issue?
going to close this for now, as it is stale and requires a repro script.
What is the problem?
When running A2C using a custom model for masked actions, rllib crashes with the following error:
Ray version and other system information (Python version, TensorFlow version, OS):
Python: 3.6.10 Ray: 0.8.4 Tensorflow: 1.15.0 OS: macOS Mojave
Reproduction (REQUIRED)
After some time, I was able to locate the trainer config parameter causing the error. It seems that setting entropy_coeff > 1 leads to this crash, while things run fine when the mask is less stable/removed, using say tf.float16.min, entropy_coeff < 1, or using PPO. I expect this issue would also occur with A3C. A script which reproduces the issue can be found here.