Closed song6299 closed 5 years ago
@song6299 I suggest that you try to train it with a single GPU to see what happends, using the following command line.
python tools/train_net.py \
--skip-test \
--config-file configs/fcos/fcos_R_50_FPN_1x.yaml \
DATALOADER.NUM_WORKERS 2 \
OUTPUT_DIR training_dir/fcos_R_50_FPN_1x \
SOLVER.IMS_PER_BATCH 1
Thank you for your reply, I have run that command line, now raise error as following: 2019-05-15 15:47:24,208 maskrcnn_benchmark.trainer INFO: Start training Segmentation fault What is the reason of segmentation fault? Another question is I install the environment as INSTALL.md, why the version of pytorch is 1.1.0, does it will influence the experiment?
@song6299 Please check https://github.com/tianzhi0549/FCOS/blob/master/TROUBLESHOOTING.md. It might result from your lower GCC version. Pytorch 1.1.0 should not be the reason.
Thanks, I will install higher gcc~~~
Hi! I try to train coco_train2017 data following the step as you shown, but raise an error as follow: 2019-05-15 10:23:24,814 maskrcnn_benchmark.trainer INFO: Start training Traceback (most recent call last): File "/home/work/songping/anaconda3/envs/FCOS/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/work/songping/anaconda3/envs/FCOS/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/work/songping/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/distributed/launch.py", line 235, in
main()
File "/home/work/songping/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/distributed/launch.py", line 231, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/home/work/songping/anaconda3/envs/FCOS/bin/python', '-u', 'tools/train_net.py', '--local_rank=0', '--skip-test', '--config-file', 'configs/fcos/fcos_R_50_FPN_1x.yaml', 'DATALOADER.NUM_WORKERS', '2', 'OUTPUT_DIR', 'training_dir/fcos_R_50_FPN_1x']' died with <Signals.SIGSEGV: 11>.
could you help me to solve the problem? thank you