Benchmark for deeper models

roytseng-tw / Detectron.pytorch

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.

MIT License

2.82k stars 565 forks source link

Benchmark for deeper models #64

Closed li-js closed 6 years ago

li-js commented 6 years ago

Thanks for sharing the great code!

I can also get similar AP for both box and segm with R-50-FPN model, as confirmed in Issue #24.

I am wondering if there are some benchmark results for deeper models like R-101-FPN. On my side, the results for R-101-FPN is not as good as the one in Detectron. Do you guys reproduce the performance of Detectron (box ap 40, segm ap 35.9) for R-101-FPN @roytseng-tw @Rizhiy?

roytseng-tw commented 6 years ago

I haven't try training from scratch with R-101 backbone. What's your results with R-101-FPN and how did you training it (command, number of GPUs) ?

fitsumreda commented 6 years ago

@li-js @roytseng-tw @Rizhiy My runs still do not reproduce the latest benchmarks from @roytseng-tw I used this commit https://github.com/roytseng-tw/Detectron.pytorch/commit/ab028df7c73ca75cf4c7dc0a04b577a8e47722aa in pytorch 0.3.0.post4, I think this is the second to the last commit.

I tried three experiments, 2x over 4 GPUs and 1x over 8GPUs: All evaluation results below are obtained using ckpt/model_step89999.pth

4GPUs python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --nw 16 --use_tfboard
- I got bbox mAP 0.3587 and mask mAP 0.3243
- I got bbox mAP 0.3592 and mask mAP 0.3243
8GPUs python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --nw 16 --use_tfboard
- I got bbox mAP 0.3604 and mask mAP 0.3263

Note, the codes do produce expected numbers if I do evaluation using detectron checkpoints. I think, tools/test_net.py is fine.

Evaluation command python3 tools/test_net.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --load_ckpt /path/to/checkpoint/model_step89999.pth --multi-gpu-testing

Any thoughts?

li-js commented 6 years ago

For the Res101-FPN, it is not really training from scratch, as the ImageNet pretrained weights from Caffe is loaded.

For the settings, I used 4 GPUs (GeForce GTX 1080 Ti) with python3, pytorch 0.3.1.post2 and cuda 8.0.

I have two sets of results. Set 1: NUM_GPUS: 4 MAX_ITER: 360k (STEPS adjusted accordingly) BASE_LR: 0.01 IM_PER_GPU: 1 python3 tools/train_net_step.py --dataset coco2017 --cfg config/e2e_mask_rcnn_R-101-FPN2x[modified to use 4 gpus as above].yaml

Results: Seg AP: 0.333, Box AP: 0.369 on last step.

Set 2: I use iter_size=2 to increase effective batch size with the same config. python3 tools/train_net_step.py --dataset coco2017 --cfg config/e2e_mask_rcnn_R-101-FPN2x[modified to use 4 gpus as above].yaml --iter_size 2 I noted that the MAX_TER is automatically scaled down to 180k.

Results: Seg AP: 0.336, Box AP: 0.368 on last step.

The results are similar to R-50-FPN. Any help is appreciated @roytseng-tw

fitsumreda commented 6 years ago

@li-js could you share your settings for R-50-FPN that reproduced the desired numbers?

li-js commented 6 years ago

@fitsumreda Sure I only use two GPUs with two images per gpu, with BASE_LR 0.005 and a total 360k iterations. Other setting are the same as in e2e_mask_rcnn_R-50-FPN_1x.yaml and train_net_step.py was used. Suprisingly I got 34.1 Seg AP and 37.9 Box Ap.

fitsumreda commented 6 years ago

Thank you so much, @li-js !

roytseng-tw commented 6 years ago

@li-js Did you modify NUM_GPUS in the config file ? If yes, do not. I have already emphasized that in README. Maybe I should make it clearer.

li-js commented 6 years ago

I did modify the NUM_GPUS to be 4 in my case. Thanks for pointing it out.

So if I only have 4 GPUs and each GPU can only hold 1 image, what is the suggested training schedule? Since the Max_Iter and BASE_LR will be adjusted automatically, am I right to just use the cfg file here unchanged and use the following command?

python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 4

And use the following for 4GPUs and each GPU can hold 2 images: python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 8

Correct me if I am wrong.

roytseng-tw commented 6 years ago

Yes, you are correct. 😃
Moreover, you can use --iter_size X to mimic bigger batch size as you wish. And if possible, I think maybe it's better to keep the same IMS_PER_BATCH, 2 for most cases.

li-js commented 6 years ago

Thanks, closing here. In official Detectron, the ResNeXt series backbone all use 1 images per batch due to memory constraints, yet they still have even better performance than R-101 series.

Still looking forward to a benchmark on R-101-FPN/ResNext-series if anyone successfully reproduces the results. 💯

li-js commented 6 years ago

@roytseng-tw With your suggestions, I trained with:

python tools/train_net_step.py --dataset coco2017 --cfg configs/baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --bs 8 --iter_size 2

without changing the config file. I got better performance AP seg 34.5, AP det 38.5, but still not matching official Detectron's AP det 40, AP seg 35.9.

Any suggestions are appreciated.

roytseng-tw commented 6 years ago

I think these numbers may be reasonable on my experience. When I trained e2e_mask_rcnn_R-50-FPN_2x.yaml before, I always got numbers lower than Detectron's. However, as reported by you and others in the issues, your scores are matched to or even better than Detectron's. So think it's just some uncertainty in the training of deep neural networks that leads to this performance differences.