Closed li-js closed 6 years ago
I haven't try training from scratch with R-101 backbone. What's your results with R-101-FPN and how did you training it (command, number of GPUs) ?
@li-js @roytseng-tw @Rizhiy
My runs still do not reproduce the latest benchmarks from @roytseng-tw
I used this commit https://github.com/roytseng-tw/Detectron.pytorch/commit/ab028df7c73ca75cf4c7dc0a04b577a8e47722aa in pytorch 0.3.0.post4
, I think this is the second to the last commit.
I tried three experiments, 2x over 4 GPUs and 1x over 8GPUs:
All evaluation results below are obtained using ckpt/model_step89999.pth
4GPUs python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --nw 16 --use_tfboard
8GPUs python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --nw 16 --use_tfboard
Note, the codes do produce expected numbers if I do evaluation using detectron checkpoints. I think, tools/test_net.py is fine.
Evaluation command
python3 tools/test_net.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --load_ckpt /path/to/checkpoint/model_step89999.pth --multi-gpu-testing
Any thoughts?
For the Res101-FPN, it is not really training from scratch, as the ImageNet pretrained weights from Caffe is loaded.
For the settings, I used 4 GPUs (GeForce GTX 1080 Ti) with python3, pytorch 0.3.1.post2 and cuda 8.0.
I have two sets of results. Set 1: NUM_GPUS: 4 MAX_ITER: 360k (STEPS adjusted accordingly) BASE_LR: 0.01 IM_PER_GPU: 1 python3 tools/train_net_step.py --dataset coco2017 --cfg config/e2e_mask_rcnn_R-101-FPN2x[modified to use 4 gpus as above].yaml
Results: Seg AP: 0.333, Box AP: 0.369 on last step.
Set 2: I use iter_size=2 to increase effective batch size with the same config. python3 tools/train_net_step.py --dataset coco2017 --cfg config/e2e_mask_rcnn_R-101-FPN2x[modified to use 4 gpus as above].yaml --iter_size 2 I noted that the MAX_TER is automatically scaled down to 180k.
Results: Seg AP: 0.336, Box AP: 0.368 on last step.
The results are similar to R-50-FPN. Any help is appreciated @roytseng-tw
@li-js could you share your settings for R-50-FPN that reproduced the desired numbers?
@fitsumreda Sure I only use two GPUs with two images per gpu, with BASE_LR 0.005 and a total 360k iterations. Other setting are the same as in e2e_mask_rcnn_R-50-FPN_1x.yaml and train_net_step.py was used. Suprisingly I got 34.1 Seg AP and 37.9 Box Ap.
Thank you so much, @li-js !
@li-js Did you modify NUM_GPUS in the config file ? If yes, do not. I have already emphasized that in README. Maybe I should make it clearer.
I did modify the NUM_GPUS to be 4 in my case. Thanks for pointing it out.
So if I only have 4 GPUs and each GPU can only hold 1 image, what is the suggested training schedule? Since the Max_Iter and BASE_LR will be adjusted automatically, am I right to just use the cfg file here unchanged and use the following command?
python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 4
And use the following for 4GPUs and each GPU can hold 2 images: python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 8
Correct me if I am wrong.
Yes, you are correct. 😃
Moreover, you can use --iter_size X
to mimic bigger batch size as you wish.
And if possible, I think maybe it's better to keep the same IMS_PER_BATCH
, 2 for most cases.
Thanks, closing here. In official Detectron, the ResNeXt series backbone all use 1 images per batch due to memory constraints, yet they still have even better performance than R-101 series.
Still looking forward to a benchmark on R-101-FPN/ResNext-series if anyone successfully reproduces the results. 💯
@roytseng-tw With your suggestions, I trained with:
python tools/train_net_step.py --dataset coco2017 --cfg configs/baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 8 --iter_size 2
without changing the config file. I got better performance AP seg 34.5, AP det 38.5, but still not matching official Detectron's AP det 40, AP seg 35.9.
Any suggestions are appreciated.
I think these numbers may be reasonable on my experience. When I trained e2e_mask_rcnn_R-50-FPN_2x.yaml
before, I always got numbers lower than Detectron's. However, as reported by you and others in the issues, your scores are matched to or even better than Detectron's. So think it's just some uncertainty in the training of deep neural networks that leads to this performance differences.
Thanks for sharing the great code!
I can also get similar AP for both box and segm with R-50-FPN model, as confirmed in Issue #24.
I am wondering if there are some benchmark results for deeper models like R-101-FPN. On my side, the results for R-101-FPN is not as good as the one in Detectron. Do you guys reproduce the performance of Detectron (box ap 40, segm ap 35.9) for R-101-FPN @roytseng-tw @Rizhiy?