Open hellohaozheng opened 1 year ago
Hi, It seems like a Nane loss
error, can you check your log to see whether you get Nane loss in training? And if you used 4 GPU, I suggest you can turn down the learning rate in training.
Ok, I'll try turning down the learning rate. But I don't understand what you mean Nane loss. Can you explain it in details? Thanks! @VVsssssk
Hello, I found another issue reporting the same problem. There may be some problems with the original mvx-net model. @VVsssssk @lindahua @atinfinity @mickeyouyou
Yeah, when I trained MVXNet find some problems too, sometimes model loss is Nane or raises OOM error, so I think maybe it's unstable? And then I turned down the model's learning rate to get a relatively normal result.
@hellohaozheng Hi, we have fixed this bug in the PR https://github.com/open-mmlab/mmdetection3d/pull/2282
你好,你解决了吗,我也遇到相同的问题,可视化出来的检测框也是完全偏离物体的
你好,你解决了吗,我也遇到相同的问题,可视化出来的检测框也是完全偏离物体的
I remember I fixed this problem by adjusting the coordinate system orientation, you can try it
I think so, can u share the code which you changed about it
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
master branch https://github.com/open-mmlab/mmdetection3d
Environment
sys.platform: linux Python: 3.8.13 (default, Oct 21 2022, 23:50:54) [GCC 11.2.0] CUDA available: True GPU 0,1,2,3: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr NVCC: Cuda compilation tools, release 11.5, V11.5.119 GCC: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 PyTorch: 1.10.1 PyTorch compiling details: PyTorch built with:
TorchVision: 0.11.2 OpenCV: 4.6.0 MMCV: 1.6.2 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.3 MMDetection: 2.25.3 MMSegmentation: 0.29.0 MMDetection3D: 1.0.0rc4+9556958
Reproduces the problem - code sample
Here is the mvx_config which I used.
And I didn't change tools/train.py.
Reproduces the problem - command or script
When I trained the mvxnet on 1 GPU, I used such command as follows.
When I trained the mvxnet on multiple GPUs, I used such command as follows.
Reproduces the problem - error message
When I trained it on a GPU, I got the error report.
When I trained it on multiple GPUs, I got a strange result in the validation.
Additional information
I used KITTI for the training. It seems a problem with mmcv. @lindahua @happynear @aditya9710 I need your help. Thanks