Issue with nan Values in Actions

Zengxia-Guo commented 5 months ago

Description During training, I encountered an issue where the actions contain nan values, causing the program to throw an AssertionError. Below is the detailed error log: | train | E: 4 | S: 1000 | D: 8.3 s | R: 8.6273 | BR: 0.0000 | ALOSS: 0.0000 | CLOSS: 0.0000 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 Action: [-0.08086713 -0.05156302 0.03296159 -0.20498772 -0.34121144 0.21736234] Action Space: Box([-1. -1. -1. -1. -1. -1.], [1. 1. 1. 1. 1. 1.], (6,), float32) Action: [nan nan nan nan nan nan] Action Space: Box([-1. -1. -1. -1. -1. -1.], [1. 1. 1. 1. 1. 1.], (6,), float32) Traceback (most recent call last): File "/workspace/RAP_distance/train.py", line 487, in <module> main() File "/workspace/RAP_distance/train.py", line 441, in main next_obs, reward, done, _ = env.step(action) File "/workspace/RAP_distance/utils.py", line 229, in step obs, reward, done, info = self.env.step(action) File "/opt/conda/lib/python3.10/site-packages/gym/wrappers/time_limit.py", line 18, in step observation, reward, done, info = self.env.step(action) File "/workspace/RAP_distance/dmc2gym/wrappers.py", line 172, in step assert self._norm_action_space.contains(action) AssertionError

Problem Description In the fourth episode of training, the generated action contains nan values, which causes the env.step(action) call to fail and raises an AssertionError. The specific action and action space information are as follows:

Action Space: Box([-1. -1. -1. -1. -1. -1.], [1. 1. 1. 1. 1. 1.], (6,), float32) Action: [-0.08086713 -0.05156302 0.03296159 -0.20498772 -0.34121144 0.21736234] Action Space: Box([-1. -1. -1. -1. -1. -1.], [1. 1. 1. 1. 1. 1.], (6,), float32) Action: [nan nan nan nan nan nan]

Request for Help I would appreciate any advice or solutions to ensure that actions do not contain nan values during training. Thank you for your assistance!

jianda-chen commented 5 months ago

The nan issue might be caused by the torch version. You can try older torch version, e.g. torch==1.7.1 .

Zengxia-Guo commented 5 months ago

Thank you. Based on your suggestion, I have resolved the issue of nan values by using Docker image pytorch/Pytorch: 1.7.1 cuda11.0 cudnn8 dev. But why is ALOSS negative here and what does it mean. (I recently read your paper and am currently researching experimental code. I have many areas that I do not understand and would like to ask for your advice.)

| train | E: 26 | S: 6500 | D: 11.0 s | R: 9.3067 | BR: 0.0516 | ALOSS: -7.4499 | CLOSS: 0.0221 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 27 | S: 6750 | D: 10.9 s | R: 13.4442 | BR: 0.0527 | ALOSS: -7.7079 | CLOSS: 0.0254 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 28 | S: 7000 | D: 11.0 s | R: 18.2660 | BR: 0.0541 | ALOSS: -7.8978 | CLOSS: 0.0228 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 29 | S: 7250 | D: 11.0 s | R: 8.5523 | BR: 0.0517 | ALOSS: -8.1137 | CLOSS: 0.0231 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 30 | S: 7500 | D: 11.0 s | R: 12.6716 | BR: 0.0531 | ALOSS: -8.3132 | CLOSS: 0.0226 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | eval | S: 7500 | ER: 1.4013 | eval | S: 7500 | ER: 1.3304 | eval | S: 7500 | ER: 1.3962 | eval | S: 7500 | ER: 0.7532 | eval | S: 7500 | ER: 1.3104 | train | E: 31 | S: 7750 | D: 16.3 s | R: 14.2239 | BR: 0.0534 | ALOSS: -8.5087 | CLOSS: 0.0227 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 32 | S: 8000 | D: 11.0 s | R: 16.4379 | BR: 0.0516 | ALOSS: -8.6723 | CLOSS: 0.0247 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 33 | S: 8250 | D: 11.0 s | R: 11.4438 | BR: 0.0533 | ALOSS: -8.8470 | CLOSS: 0.0224 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 34 | S: 8500 | D: 11.0 s | R: 18.4024 | BR: 0.0534 | ALOSS: -9.0096 | CLOSS: 0.0263 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 35 | S: 8750 | D: 11.0 s | R: 18.2989 | BR: 0.0532 | ALOSS: -9.1825 | CLOSS: 0.0249 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 36 | S: 9000 | D: 10.9 s | R: 9.4131 | BR: 0.0529 | ALOSS: -9.2857 | CLOSS: 0.0220 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 37 | S: 9250 | D: 11.0 s | R: 9.5185 | BR: 0.0535 | ALOSS: -9.4056 | CLOSS: 0.0220 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 38 | S: 9500 | D: 11.0 s | R: 19.4781 | BR: 0.0536 | ALOSS: -9.5602 | CLOSS: 0.0235 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 39 | S: 9750 | D: 11.0 s | R: 8.8815 | BR: 0.0533 | ALOSS: -9.6693 | CLOSS: 0.0245 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 40 | S: 10000 | D: 11.0 s | R: 18.4672 | BR: 0.0531 | ALOSS: -9.8003 | CLOSS: 0.0243 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | eval | S: 10000 | ER: 4.4221 | eval | S: 10000 | ER: 6.2904 | eval | S: 10000 | ER: 1.2938 | eval | S: 10000 | ER: 2.4657 | eval | S: 10000 | ER: 4.0768

jianda-chen commented 5 months ago

The ALOSS is actor loss. Please refer to the SAC paper or the code for the definition of actor loss. It can be negative by definition.

jianda-chen / RAP_distance

Issue with nan Values in Actions #3