Something was wrong when I train the model

haibochina commented 4 years ago

Traceback (most recent call last): File "/home/haibo/anaconda3/envs/FCOS/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/haibo/anaconda3/envs/FCOS/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/haibo/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/distributed/launch.py", line 235, in <module> main() File "/home/haibo/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/distributed/launch.py", line 231, in main cmd=process.args) subprocess.CalledProcessError: Command '['/home/haibo/anaconda3/envs/FCOS/bin/python', '-u', 'tools/train_net.py', '--local_rank=0', '--config-file', 'configs/fcos/fcos_imprv_R_50_FPN_1x.yaml', 'DATALOADER.NUM_WORKERS', '2', 'OUTPUT_DIR', 'training_dir/fcos_imprv_R_50_FPN_1x']' returned non-zero exit status 1.

vandesa003 commented 4 years ago

Got same error, do you use distributed training?

haibochina commented 4 years ago

Got same error, do you use distributed training?

No! I just have one 2080ti, so I set the --nproc_per_node to 1, and I changed the batch_size! May be I'll try train int in pytorch1.0.0 environment, now I'm in pytorch1.2 environment.

Finniu commented 4 years ago

There should be other error messages, you should post them to here, not this one, every issue will produce the error message you mentioned

haibochina commented 4 years ago

I have solved the problem! I use the pytorch1.0.1, and the trained result is not as good as the paper. 2019-10-26 18:37:29,788 fcos_core.inference INFO: Total run time: 0:10:45.425196 (0.06373940316109278 s / img per device, on 4 devices) 2019-10-26 18:37:29,790 fcos_core.inference INFO: Model inference time: 0:07:43.577323 (0.04578089304364012 s / img per device, on 4 devices) 2019-10-26 18:37:45,994 fcos_core.inference INFO: Preparing results for COCO format 2019-10-26 18:37:45,999 fcos_core.inference INFO: Preparing bbox results 2019-10-26 18:38:07,653 fcos_core.inference INFO: Evaluating predictions Loading and preparing results... DONE (t=27.31s) creating index... index created! Running per image evaluation... Evaluate annotation type *bbox* DONE (t=402.76s). Accumulating evaluation results... DONE (t=90.72s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.365 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.552 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.393 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.196 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.390 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.457 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.313 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.519 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.555 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.339 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.595 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.716 Maximum f-measures for classes: [0.7616075848827596, 0.5610587209763056, 0.6479872982708375, 0.6748353338679779, 0.7507307640419121, 0.742584763605127, 0.7859612943390903, 0.5028484980646567, 0.49965681080612806, 0.5723222337644548, 0.8092759110269758, 0.729987557030278, 0.5095880681818182, 0.36950767601905765, 0.582227296098441, 0.8237298033871909, 0.7507158827913545, 0.7393787150764558, 0.7303571066863821, 0.7417025450170824, 0.830699758086193, 0.8129636860601328, 0.8524193436864741, 0.8717712177121771, 0.36587406735931943, 0.5913927874890716, 0.3139815953521903, 0.5535752499159304, 0.5318725099601593, 0.7572241569693365, 0.45926243567753006, 0.47117052163915896, 0.6257043198282801, 0.6257599498589784, 0.5669816564758199, 0.6076411350667658, 0.7064557779212395, 0.5586290057368363, 0.7429708361694328, 0.5639386490540552, 0.6104380242311276, 0.5589222015860381, 0.45967350897042186, 0.3373717585193587, 0.3178502284519477, 0.539897937730548, 0.4459195626708317, 0.35359369940566343, 0.4960344370079901, 0.45142065405128395, 0.48971491182733456, 0.4259215350101227, 0.4964238089465389, 0.6815844085536407, 0.6529213266121549, 0.5013213470964245, 0.4656591692678812, 0.5329748157395626, 0.4707838838914506, 0.6268880557043385, 0.4798020316613747, 0.7456186935740838, 0.6925656791503633, 0.7091774303195105, 0.6692913385826772, 0.5040514110086616, 0.6169283640967779, 0.5072120458756479, 0.6944762185897199, 0.5196644514653013, 0.2829348722176422, 0.5469471240222229, 0.6141078838174274, 0.3574181181214265, 0.7307706945765937, 0.5233654696248894, 0.4224977856510186, 0.6506024096385543, 0.1391304347826087, 0.28447136094169834] Score thresholds for classes (used in demos for visualization purposes): [0.5010206699371338, 0.514956533908844, 0.5130963325500488, 0.5334678888320923, 0.5346886515617371, 0.5607560873031616, 0.5667353868484497, 0.5099794864654541, 0.4713687002658844, 0.5089412927627563, 0.5250256657600403, 0.5572448968887329, 0.4885828197002411, 0.49496129155158997, 0.44194746017456055, 0.5787045359611511, 0.5488421320915222, 0.535602867603302, 0.4815693497657776, 0.5140263438224792, 0.5492225289344788, 0.5702610015869141, 0.549586296081543, 0.5380666851997375, 0.4872756004333496, 0.4956148564815521, 0.4708016812801361, 0.5190811157226562, 0.4915080964565277, 0.5255230665206909, 0.46253687143325806, 0.5103307962417603, 0.4706781804561615, 0.448844850063324, 0.5154725313186646, 0.5369377732276917, 0.5282315015792847, 0.4981441795825958, 0.5178103446960449, 0.4829654395580292, 0.5263268947601318, 0.5006193518638611, 0.48944059014320374, 0.453737735748291, 0.4531874656677246, 0.533451497554779, 0.4840342402458191, 0.4937095046043396, 0.5162220597267151, 0.46484774351119995, 0.5131992101669312, 0.4792819917201996, 0.48299506306648254, 0.5468752384185791, 0.4877933859825134, 0.5142653584480286, 0.4917145073413849, 0.5073975324630737, 0.48615020513534546, 0.5235533714294434, 0.483765184879303, 0.5352375507354736, 0.5656739473342896, 0.5275744795799255, 0.5074234008789062, 0.5074090361595154, 0.5146856904029846, 0.508337676525116, 0.5832608938217163, 0.5214544534683228, 0.46063321828842163, 0.5019127130508423, 0.5593647360801697, 0.43145331740379333, 0.5567354559898376, 0.5115346312522888, 0.4954207241535187, 0.5325180292129517, 0.4542503356933594, 0.47217491269111633] 2019-10-26 18:47:41,048 fcos_core.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 0.3648715103238864), ('AP50', 0.5517453677247636), ('AP75', 0.3927932861368953), ('APs', 0.1955492135371947), ('APm', 0.3899126290279312), ('APl', 0.456886671946741)]))])

lw230 commented 4 years ago

@haibochina how solve this problem

tianzhi0549 / FCOS

Something was wrong when I train the model #177