princeton-vl / CornerNet

BSD 3-Clause "New" or "Revised" License
2.36k stars 475 forks source link

Segmentation fault when corner pool is called #47

Closed may0324 closed 5 years ago

may0324 commented 5 years ago

I met segmentation fault when calling corner pooling. I have updated my gcc version to 4.9.4 and I am using Python 3.6.5. After rebuilding the cpools I still came across the problem. Can anyone give me some help? 2018-11-13 12 20 25

heilaw commented 5 years ago

When you rebuild the corner pooling layers, you should see something like

gcc -pthread -B /foo/anaconda3/envs/debug/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include/TH -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include/THC -I/foo/anaconda3/envs/debug/include/python3.6m -c src/top_pool.cpp -o build/temp.linux-x86_64-3.6/src/top_pool.o -DTORCH_EXTENSION_NAME=top_pool -std=c++11

If you didn't see it, that means Python didn't recompile the layers. To force Python to recompile the layers, you can change the last modified dates of the cpp files under src and recompile them.

may0324 commented 5 years ago

When you rebuild the corner pooling layers, you should see something like

gcc -pthread -B /foo/anaconda3/envs/debug/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include/TH -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include/THC -I/foo/anaconda3/envs/debug/include/python3.6m -c src/top_pool.cpp -o build/temp.linux-x86_64-3.6/src/top_pool.o -DTORCH_EXTENSION_NAME=top_pool -std=c++11

If you didn't see it, that means Python didn't recompile the layers. To force Python to recompile the layers, you can change the last modified dates of the cpp files under src and recompile them.

I modified the files under src and recompiled again and finally it worked ! That seems the Python didn't recompile the layers before. Thanks for your replying

knsong commented 5 years ago

When you rebuild the corner pooling layers, you should see something like

gcc -pthread -B /foo/anaconda3/envs/debug/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include/TH -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include/THC -I/foo/anaconda3/envs/debug/include/python3.6m -c src/top_pool.cpp -o build/temp.linux-x86_64-3.6/src/top_pool.o -DTORCH_EXTENSION_NAME=top_pool -std=c++11

If you didn't see it, that means Python didn't recompile the layers. To force Python to recompile the layers, you can change the last modified dates of the cpp files under src and recompile them.

That did work for me! Thanks!

Demohai commented 5 years ago

recompile the pool src, still get the Segmentation fault

Demohai commented 5 years ago

@heilaw @may0324 @knsong my torch version is 0.4.0, python version is 3.6.3. I used the method above to make sure the pool src is recompiled, but when I run the code, still get the Segmentation fault. Does this method really works for you all?

knsong commented 5 years ago

@Demohai just follow the README and use gcc 4.6.4 recompile pool src works for me

Demohai commented 5 years ago

@knsong except gcc version, everything is the same. I'll try to change the gcc version, thank you!

mrlaiii commented 5 years ago

When you rebuild the corner pooling layers, you should see something like

gcc -pthread -B /foo/anaconda3/envs/debug/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include/TH -I/foo/anaconda3/envs/debug/lib/python3.6/site-packages/torch/lib/include/THC -I/foo/anaconda3/envs/debug/include/python3.6m -c src/top_pool.cpp -o build/temp.linux-x86_64-3.6/src/top_pool.o -DTORCH_EXTENSION_NAME=top_pool -std=c++11

If you didn't see it, that means Python didn't recompile the layers. To force Python to recompile the layers, you can change the last modified dates of the cpp files under src and recompile them.

could you please tell me how to change the last modified dates of the cpp files under src

mrlaiii commented 5 years ago

(CornerNet) ncx@hp-006-1-workstation:~/桌面/CornerNet$ python train.py CornerNet loading all datasets... using 4 threads loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=16.17s) creating index... index created! loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=16.52s) creating index... index created! loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=17.71s) creating index... index created! loading from cache file: ./cache/coco_trainval2014.pkl loading annotations into memory... Done (t=15.92s) creating index... index created! loading from cache file: ./cache/coco_minival2014.pkl loading annotations into memory... Done (t=0.49s) creating index... index created! system config... {'batch_size': 4, 'cache_dir': './cache', 'chunk_sizes': [4], 'config_dir': './config', 'data_dir': './data', 'data_rng': <mtrand.RandomState object at 0x7f25f53fe5e8>, 'dataset': 'MSCOCO', 'decay_rate': 10, 'display': 5, 'learning_rate': 0.00025, 'max_iter': 500000, 'nnet_rng': <mtrand.RandomState object at 0x7f25f53fe630>, 'opt_algo': 'adam', 'prefetch_size': 5, 'pretrain': None, 'result_dir': './results', 'sampling_function': 'kp_detection', 'snapshot': 5000, 'snapshot_name': 'CornerNet', 'stepsize': 450000, 'test_split': 'testdev', 'train_split': 'trainval', 'val_iter': 100, 'val_split': 'minival', 'weight_decay': False, 'weight_decay_rate': 1e-05, 'weight_decay_type': 'l2'} db config... {'ae_threshold': 0.5, 'border': 128, 'categories': 80, 'data_aug': True, 'gaussian_bump': True, 'gaussian_iou': 0.3, 'gaussian_radius': -1, 'input_size': [511, 511], 'lighting': True, 'max_per_image': 100, 'merge_bbox': False, 'nms_algorithm': 'exp_soft_nms', 'nms_kernel': 3, 'nms_threshold': 0.5, 'output_sizes': [[128, 128]], 'rand_color': True, 'rand_crop': True, 'rand_pushes': False, 'rand_samples': False, 'rand_scale_max': 1.4, 'rand_scale_min': 0.6, 'rand_scale_step': 0.1, 'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]), 'special_crop': False, 'test_scales': [1], 'top_k': 100, 'weight_exp': 8} len of db: 118287 start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... building model... module_file: models.CornerNet shuffling indices... total parameters: 201035212 setting learning rate to: 0.00025 training start... 0%| | 0/500000 [00:00<?, ?it/s]/home/ncx/anaconda3/envs/CornerNet/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.") segmentation fault(core dumped) Could you help me to see what happen?my env is ubuntu16.04,pytorch 0.4.1,python3.6,cuda9.0,with one 1080ti gpu,i think that my env is appropriate,i follow your instructions and do it step by step,it work well.But when i run the command "$ python train.py CornerNet" I'm prompted with the above error.Please spare some time to answer my question,Thx!

Demohai commented 5 years ago

@mrlaiii artificial modify all the 4 corner pooling sources, such as line feed etc, then recompile the corner polling src. Note that the gcc version must be higher than 4.9.4 which is pytorch 0.4.0 needed.

lililiiiiiiiiii commented 5 years ago

@mrlaiii artificial modify all the 4 corner pooling sources, such as line feed etc, then recompile the corner polling src. Note that the gcc version must be higher than 4.9.4 which is pytorch 0.4.0 needed.

hello,my gcc version is 5.5,pytorch is 0.4.1,and i am sure that i has recompiled the pooling,but still get the
problem, you have any advices???

YijiaZhao commented 5 years ago

@Demohai have you solved the problem? I also get Segmentation fault in training process

Demohai commented 5 years ago

@YijiaZhao
image

dlml commented 4 years ago

@lililiiiiiiiiii Have you solved the problem?

YijiaZhao commented 4 years ago

yeah, pay attention to timestamp

                            赵一嘉

                                邮箱:201721210022@mail.bnu.edu.cn

    签名由 网易邮箱大师 定制

On 10/30/2019 09:07, huangwei wrote: @lililiiiiiiiiii Have you solved the problem?

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/princeton-vl/CornerNet/issues/47?email_source=notifications\u0026email_token=AIVMHF7IUA3JXCLVAFU3OULQRDM6DA5CNFSM4GDLSBEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECSSWXY#issuecomment-547695455", "url": "https://github.com/princeton-vl/CornerNet/issues/47?email_source=notifications\u0026email_token=AIVMHF7IUA3JXCLVAFU3OULQRDM6DA5CNFSM4GDLSBEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECSSWXY#issuecomment-547695455", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]