Closed Zengxia-Guo closed 5 months ago
The nan
issue might be caused by the torch version. You can try older torch version, e.g. torch==1.7.1 .
Thank you. Based on your suggestion, I have resolved the issue of nan values by using Docker image pytorch/Pytorch: 1.7.1 cuda11.0 cudnn8 dev. But why is ALOSS negative here and what does it mean. (I recently read your paper and am currently researching experimental code. I have many areas that I do not understand and would like to ask for your advice.)
| train | E: 26 | S: 6500 | D: 11.0 s | R: 9.3067 | BR: 0.0516 | ALOSS: -7.4499 | CLOSS: 0.0221 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 27 | S: 6750 | D: 10.9 s | R: 13.4442 | BR: 0.0527 | ALOSS: -7.7079 | CLOSS: 0.0254 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 28 | S: 7000 | D: 11.0 s | R: 18.2660 | BR: 0.0541 | ALOSS: -7.8978 | CLOSS: 0.0228 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 29 | S: 7250 | D: 11.0 s | R: 8.5523 | BR: 0.0517 | ALOSS: -8.1137 | CLOSS: 0.0231 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 30 | S: 7500 | D: 11.0 s | R: 12.6716 | BR: 0.0531 | ALOSS: -8.3132 | CLOSS: 0.0226 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | eval | S: 7500 | ER: 1.4013 | eval | S: 7500 | ER: 1.3304 | eval | S: 7500 | ER: 1.3962 | eval | S: 7500 | ER: 0.7532 | eval | S: 7500 | ER: 1.3104 | train | E: 31 | S: 7750 | D: 16.3 s | R: 14.2239 | BR: 0.0534 | ALOSS: -8.5087 | CLOSS: 0.0227 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 32 | S: 8000 | D: 11.0 s | R: 16.4379 | BR: 0.0516 | ALOSS: -8.6723 | CLOSS: 0.0247 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 33 | S: 8250 | D: 11.0 s | R: 11.4438 | BR: 0.0533 | ALOSS: -8.8470 | CLOSS: 0.0224 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 34 | S: 8500 | D: 11.0 s | R: 18.4024 | BR: 0.0534 | ALOSS: -9.0096 | CLOSS: 0.0263 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 35 | S: 8750 | D: 11.0 s | R: 18.2989 | BR: 0.0532 | ALOSS: -9.1825 | CLOSS: 0.0249 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 36 | S: 9000 | D: 10.9 s | R: 9.4131 | BR: 0.0529 | ALOSS: -9.2857 | CLOSS: 0.0220 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 37 | S: 9250 | D: 11.0 s | R: 9.5185 | BR: 0.0535 | ALOSS: -9.4056 | CLOSS: 0.0220 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 38 | S: 9500 | D: 11.0 s | R: 19.4781 | BR: 0.0536 | ALOSS: -9.5602 | CLOSS: 0.0235 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 39 | S: 9750 | D: 11.0 s | R: 8.8815 | BR: 0.0533 | ALOSS: -9.6693 | CLOSS: 0.0245 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | train | E: 40 | S: 10000 | D: 11.0 s | R: 18.4672 | BR: 0.0531 | ALOSS: -9.8003 | CLOSS: 0.0243 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 | eval | S: 10000 | ER: 4.4221 | eval | S: 10000 | ER: 6.2904 | eval | S: 10000 | ER: 1.2938 | eval | S: 10000 | ER: 2.4657 | eval | S: 10000 | ER: 4.0768
The ALOSS
is actor loss. Please refer to the SAC paper or the code for the definition of actor loss. It can be negative by definition.
Description During training, I encountered an issue where the actions contain nan values, causing the program to throw an AssertionError. Below is the detailed error log:
| train | E: 4 | S: 1000 | D: 8.3 s | R: 8.6273 | BR: 0.0000 | ALOSS: 0.0000 | CLOSS: 0.0000 | RLOSS: 0.0000 | RHO: 0.0000 | MR: 0.0000 | EI: 0 Action: [-0.08086713 -0.05156302 0.03296159 -0.20498772 -0.34121144 0.21736234] Action Space: Box([-1. -1. -1. -1. -1. -1.], [1. 1. 1. 1. 1. 1.], (6,), float32) Action: [nan nan nan nan nan nan] Action Space: Box([-1. -1. -1. -1. -1. -1.], [1. 1. 1. 1. 1. 1.], (6,), float32) Traceback (most recent call last): File "/workspace/RAP_distance/train.py", line 487, in <module> main() File "/workspace/RAP_distance/train.py", line 441, in main next_obs, reward, done, _ = env.step(action) File "/workspace/RAP_distance/utils.py", line 229, in step obs, reward, done, info = self.env.step(action) File "/opt/conda/lib/python3.10/site-packages/gym/wrappers/time_limit.py", line 18, in step observation, reward, done, info = self.env.step(action) File "/workspace/RAP_distance/dmc2gym/wrappers.py", line 172, in step assert self._norm_action_space.contains(action) AssertionError
Problem Description In the fourth episode of training, the generated action contains nan values, which causes the env.step(action) call to fail and raises an AssertionError. The specific action and action space information are as follows:
Action Space: Box([-1. -1. -1. -1. -1. -1.], [1. 1. 1. 1. 1. 1.], (6,), float32) Action: [-0.08086713 -0.05156302 0.03296159 -0.20498772 -0.34121144 0.21736234] Action Space: Box([-1. -1. -1. -1. -1. -1.], [1. 1. 1. 1. 1. 1.], (6,), float32) Action: [nan nan nan nan nan nan]
Request for Help I would appreciate any advice or solutions to ensure that actions do not contain nan values during training. Thank you for your assistance!