open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5k stars 1.49k forks source link

[Bug] After some epochs of training pointpillars, loss and grad_norm = nan; map = 0 #2982

Open saber-pro opened 1 month ago

saber-pro commented 1 month ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0: Tesla T4 CUDA_HOME: /usr/local/cuda-12.1 NVCC: Cuda compilation tools, release 12.1, V12.1.66 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0 PyTorch: 2.1.2+cu121 PyTorch compiling details: PyTorch built with:

TorchVision: 0.16.2+cu121 OpenCV: 4.9.0 MMEngine: 0.10.4 MMDetection: 3.3.0 MMDetection3D: 1.4.0+ spconv2.0: False

Reproduces the problem - code sample

python tools/train.py configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py

Reproduces the problem - command or script

python tools/train.py configs/pointpillars/pointpillars_hv_secfpn_8xb6-160e_kitti-3d-3class.py

Reproduces the problem - error message

20240522_004525.log

Additional information

No response