Open hhaAndroid opened 1 year ago
Hi, can you run the same command by adding CUDA_LAUNCH_BLOCKING=1
? It will give better error message and trace in that case. Some thoughts for reasons are input/target wrong values, wrong values passed to loss function, etc. Please provide the error message after using CUDA_LAUNCH_BLOCKING=1
.
Hi @hhaAndroid, did you find a solution?
I'm facing the same problem. I'm now launching again with CUDA_LAUNCH_BLOCKING=1
and I'll report the error message here.
did you solve this problem? i met this error too
Hi @MingChaoXu , in my case it ended up being related to some numerical instability issue. I solved it by changing the loss weight. Possible solutions are changing the loss weight, reducing the learning rate, increasing the strength of gradient clipping (lower max norm).
Hope it helps!
I ran into a similar error. It was resolved by reducing the value of the base learning rate (base_lr
).
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
3.x branch https://github.com/open-mmlab/mmdetection/tree/3.x
Environment
mmyolo 0.4.0+dev
Reproduces the problem - code sample
...
Reproduces the problem - command or script
..
Reproduces the problem - error message
Additional information
No response