rsummers11 / CADLab

Imaging Biomarkers and Computer-Aided Diagnosis Laboratory
https://www.cc.nih.gov/meet-our-doctors/rsummers.html
441 stars 187 forks source link

Segmentation fault(core dumped) #25

Closed Try2ChangeX closed 4 years ago

Try2ChangeX commented 4 years ago

When the MODE is ''vis" or "eval", here comes the problem. Anyone Help? T.T

viggin commented 4 years ago

Can you debug step by step to see which line caused the problem?

Try2ChangeX commented 4 years ago

Thanks for your reply. The same question when the MODE is 'demo' or ''train". When i use Pycharm as IDE, the notice is "Process finished with exit code 139 (interrupted by signal 1: SIGSEGV)". When in the terminal, the notice is "Segmentation fault(core dumped)". It didn't tell me which line is wrong, only the question above.

I tried debug step by step, when the MODE is 'demo', program stops at line 67 in result = model(im_list) "MULAN_universal_lesion_analysis/maskrcnn/engine/demo_process.py".

when the MODE is ''train', program stops at line 119 in loss_dict = model(images, targets, infos) MULAN_universal_lesion_analysis/maskrcnn/engine/trainer.py

I am wondering if my GPU is too bad to store the model and data. My GPU is GTX1080Ti. “SIGSEGV” is something wrong related to memory. The path of data is ‘data/DeepLesion' I'm a novice, please forgive me for not being able to describe my problem clearly. Thank you for your patience and reply.

Try2ChangeX commented 4 years ago

I tried to debug step by step again.

When the MODE is 'train', the program stops at Line120 (boxes = self.box_selector_train( anchors, objectness, rpn_box_regression, targets)) in MULAN_universal_lesion_analysis/maskrcnn/modeling/rpn/rpn.py

When the MODE is 'demo', the program stops at Line 133 in MULAN_universal_lesion_analysis/maskrcnn/modeling/rpn/rpn.py

The program will not goto Line 189 (make_rpn_postprocessor) in MULAN_universal_lesion_analysis/maskrcnn/modeling/rpn/inference.py

I tried to change the ANCHOR_SIZES from [16, 24, 32, 48, 96] to [16, 32 ,96] to reduce the usage of memory. But it didn't work.

The model could be build. But when the data is entered into the model, this problem arises.

viggin commented 4 years ago

I couldn't tell the reason from what you describe. The code causing the error seems nothing special. Maybe you can try some online solutions, such as https://stackoverflow.com/questions/49414841/process-finished-with-exit-code-139-interrupted-by-signal-11-sigsegv

Try2ChangeX commented 4 years ago

Thank you for your reply. I tried MULAN in another conputer with NVIDIA Tesla P40 in MODE 'demo'. The program works well.

Try2ChangeX commented 4 years ago

The cause of this problem has been found. It was caused by gcc version. For more detail you can refer to https://github.com/facebookresearch/maskrcnn-benchmark/issues/268 https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/TROUBLESHOOTING.md#segmentation-fault-core-dumped-when-running-the-library