open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.42k stars 9.43k forks source link

[Bug] fatal python error :Segmentation fault #10025

Open initiater opened 1 year ago

initiater commented 1 year ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection

Environment

/home/scholar/anaconda3/envs/zhang/lib/python3.9/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. warnings.warn( sys.platform: linux Python: 3.9.16 (main, Mar 8 2023, 14:00:05) [GCC 11.2.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 3090 Ti CUDA_HOME: /home/scholar/anaconda3/envs/zhang NVCC: Cuda compilation tools, release 11.6, V11.6.124 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.13.1 PyTorch compiling details: PyTorch built with:

TorchVision: 0.14.1 OpenCV: 4.7.0 MMCV: 1.7.1 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.6 MMDetection: 2.28.2+e9cae2d

Reproduces the problem - code sample

Fatal Python error: Segmentation fault

Thread 0x00007efc318a9700 (most recent call first): File "/home/scholar/anaconda3/envs/zhang/lib/python3.9/threading.py", line 312 in wait File "/home/scholar/anaconda3/envs/zhang/lib/python3.9/multiprocessing/queues.py", line 231 in _feed File "/home/scholar/anaconda3/envs/zhang/lib/python3.9/threading.py", line 917 in run File "/home/scholar/anaconda3/envs/zhang/lib/python3.9/threading.py", line 980 in _bootstrap_inner File "/home/scholar/anaconda3/envs/zhang/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x00007efc340aa700 (most recent call first):

Current thread 0x00007efd2c2e2180 (most recent call first): File "/home/scholar/anaconda3/envs/zhang/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 40 in run_iter File "/home/scholar/anaconda3/envs/zhang/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 53 in train File "/home/scholar/anaconda3/envs/zhang/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 136 in run File "/home/scholar/Documents/mmdetection/mmdet/apis/train.py", line 246 in train_detector File "/home/scholar/Documents/mmdetection/tools/train.py", line 238 in main File "/home/scholar/Documents/mmdetection/tools/train.py", line 249 in Segmentation fault (core dumped) ### Reproduces the problem - command or script "I tried to debug using gdb. This is an error reported. How should I handle it?" ### Reproduces the problem - error message I suspect it's a multithreading problem, but how can I solve it? ### Additional information _No response_
initiater commented 1 year ago

Please help me see where the problem is.

hhaAndroid commented 1 year ago

@initiater Can you try the mmdet 3.x branch?

initiater commented 1 year ago

@initiater Can you try the mmdet 3.x branch?

Ok, I'll give it a try and give feedback on the results.

initiater commented 1 year ago

@initiater Can you try the mmdet 3.x branch?

Based on your suggestions, I updated mmcv and mmdet mmcv=2.0.0.0rc4, mmdet=3.0.0rc6

"But the error is still Fatal Python error: The segmentation fault is the same as the previous error. What should I do?"

nh2522 commented 9 months ago

@initiater Can you try the mmdet 3.x branch?

Based on your suggestions, I updated mmcv and mmdet mmcv=2.0.0.0rc4, mmdet=3.0.0rc6

"But the error is still Fatal Python error: The segmentation fault is the same as the previous error. What should I do?"

I have this issue as well! Were you successful in fixing it?