Open haibochina opened 4 years ago
Got same error, do you use distributed training?
Got same error, do you use distributed training?
No! I just have one 2080ti, so I set the --nproc_per_node to 1, and I changed the batch_size! May be I'll try train int in pytorch1.0.0 environment, now I'm in pytorch1.2 environment.
There should be other error messages, you should post them to here, not this one, every issue will produce the error message you mentioned
I have solved the problem! I use the pytorch1.0.1, and the trained result is not as good as the paper. 2019-10-26 18:37:29,788 fcos_core.inference INFO: Total run time: 0:10:45.425196 (0.06373940316109278 s / img per device, on 4 devices) 2019-10-26 18:37:29,790 fcos_core.inference INFO: Model inference time: 0:07:43.577323 (0.04578089304364012 s / img per device, on 4 devices) 2019-10-26 18:37:45,994 fcos_core.inference INFO: Preparing results for COCO format 2019-10-26 18:37:45,999 fcos_core.inference INFO: Preparing bbox results 2019-10-26 18:38:07,653 fcos_core.inference INFO: Evaluating predictions Loading and preparing results... DONE (t=27.31s) creating index... index created! Running per image evaluation... Evaluate annotation type *bbox* DONE (t=402.76s). Accumulating evaluation results... DONE (t=90.72s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.365 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.552 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.393 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.196 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.390 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.457 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.313 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.519 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.555 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.339 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.595 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.716 Maximum f-measures for classes: [0.7616075848827596, 0.5610587209763056, 0.6479872982708375, 0.6748353338679779, 0.7507307640419121, 0.742584763605127, 0.7859612943390903, 0.5028484980646567, 0.49965681080612806, 0.5723222337644548, 0.8092759110269758, 0.729987557030278, 0.5095880681818182, 0.36950767601905765, 0.582227296098441, 0.8237298033871909, 0.7507158827913545, 0.7393787150764558, 0.7303571066863821, 0.7417025450170824, 0.830699758086193, 0.8129636860601328, 0.8524193436864741, 0.8717712177121771, 0.36587406735931943, 0.5913927874890716, 0.3139815953521903, 0.5535752499159304, 0.5318725099601593, 0.7572241569693365, 0.45926243567753006, 0.47117052163915896, 0.6257043198282801, 0.6257599498589784, 0.5669816564758199, 0.6076411350667658, 0.7064557779212395, 0.5586290057368363, 0.7429708361694328, 0.5639386490540552, 0.6104380242311276, 0.5589222015860381, 0.45967350897042186, 0.3373717585193587, 0.3178502284519477, 0.539897937730548, 0.4459195626708317, 0.35359369940566343, 0.4960344370079901, 0.45142065405128395, 0.48971491182733456, 0.4259215350101227, 0.4964238089465389, 0.6815844085536407, 0.6529213266121549, 0.5013213470964245, 0.4656591692678812, 0.5329748157395626, 0.4707838838914506, 0.6268880557043385, 0.4798020316613747, 0.7456186935740838, 0.6925656791503633, 0.7091774303195105, 0.6692913385826772, 0.5040514110086616, 0.6169283640967779, 0.5072120458756479, 0.6944762185897199, 0.5196644514653013, 0.2829348722176422, 0.5469471240222229, 0.6141078838174274, 0.3574181181214265, 0.7307706945765937, 0.5233654696248894, 0.4224977856510186, 0.6506024096385543, 0.1391304347826087, 0.28447136094169834] Score thresholds for classes (used in demos for visualization purposes): [0.5010206699371338, 0.514956533908844, 0.5130963325500488, 0.5334678888320923, 0.5346886515617371, 0.5607560873031616, 0.5667353868484497, 0.5099794864654541, 0.4713687002658844, 0.5089412927627563, 0.5250256657600403, 0.5572448968887329, 0.4885828197002411, 0.49496129155158997, 0.44194746017456055, 0.5787045359611511, 0.5488421320915222, 0.535602867603302, 0.4815693497657776, 0.5140263438224792, 0.5492225289344788, 0.5702610015869141, 0.549586296081543, 0.5380666851997375, 0.4872756004333496, 0.4956148564815521, 0.4708016812801361, 0.5190811157226562, 0.4915080964565277, 0.5255230665206909, 0.46253687143325806, 0.5103307962417603, 0.4706781804561615, 0.448844850063324, 0.5154725313186646, 0.5369377732276917, 0.5282315015792847, 0.4981441795825958, 0.5178103446960449, 0.4829654395580292, 0.5263268947601318, 0.5006193518638611, 0.48944059014320374, 0.453737735748291, 0.4531874656677246, 0.533451497554779, 0.4840342402458191, 0.4937095046043396, 0.5162220597267151, 0.46484774351119995, 0.5131992101669312, 0.4792819917201996, 0.48299506306648254, 0.5468752384185791, 0.4877933859825134, 0.5142653584480286, 0.4917145073413849, 0.5073975324630737, 0.48615020513534546, 0.5235533714294434, 0.483765184879303, 0.5352375507354736, 0.5656739473342896, 0.5275744795799255, 0.5074234008789062, 0.5074090361595154, 0.5146856904029846, 0.508337676525116, 0.5832608938217163, 0.5214544534683228, 0.46063321828842163, 0.5019127130508423, 0.5593647360801697, 0.43145331740379333, 0.5567354559898376, 0.5115346312522888, 0.4954207241535187, 0.5325180292129517, 0.4542503356933594, 0.47217491269111633] 2019-10-26 18:47:41,048 fcos_core.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 0.3648715103238864), ('AP50', 0.5517453677247636), ('AP75', 0.3927932861368953), ('APs', 0.1955492135371947), ('APm', 0.3899126290279312), ('APl', 0.456886671946741)]))])
@haibochina how solve this problem
Traceback (most recent call last): File "/home/haibo/anaconda3/envs/FCOS/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/haibo/anaconda3/envs/FCOS/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/haibo/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/distributed/launch.py", line 235, in <module> main() File "/home/haibo/anaconda3/envs/FCOS/lib/python3.7/site-packages/torch/distributed/launch.py", line 231, in main cmd=process.args) subprocess.CalledProcessError: Command '['/home/haibo/anaconda3/envs/FCOS/bin/python', '-u', 'tools/train_net.py', '--local_rank=0', '--config-file', 'configs/fcos/fcos_imprv_R_50_FPN_1x.yaml', 'DATALOADER.NUM_WORKERS', '2', 'OUTPUT_DIR', 'training_dir/fcos_imprv_R_50_FPN_1x']' returned non-zero exit status 1.