Working with scale: 2nd place solution to Product Detection in Densely Packed Scenes
Introduction
This repository contains code for the 2nd place solution of the detection challenge which is held within CVPR 2020 Retail-Vision workshop.
For more information see my report. For all the experiments MMDetection v1 was used.
Dataset
The dataset has been originally announced by Eran Goldman et. al.
In order to obtain the dataset for research purpose, please concat the authors.
Getting started
For evaluation purpose please clone pycocotools, change the parameter maxDets
to 300 here and then install locally.
1. Convert SKU110k csv format to COCO-like json
python sku110k_scripts/sku110k_to_coco.py --args
2. Convert a full frame COCO-like dataset to a tiled one
python sku110k_scripts/split_on_tiles.py --args
3. Training with mmdet
./tools/dist_train configs/sku110k/sku110k_faster_rcnn_r50_fpn_anchor_1x_4tiles_test_half_res.py 2
4. Testing with mmdet
./tools/dist_test configs/sku110k/sku110k_faster_rcnn_r50_fpn_anchor_1x_4tiles_test_half_res.py workdir/faster_rcnn_r50_fpn_anchor_1x_4tiles/latest.pth 2 --eval bbox
5. Create a dummy json file for the leaderboard-test
python sku110k_scripts/lb_test_to_coco.py --args
6. Inferencing with mmdet
./tools/dist_test configs/sku110k/sku110k_faster_rcnn_r50_fpn_anchor_1x_4tiles_test_half_res.py workdir/faster_rcnn_r50_fpn_anchor_1x_4tiles/latest.pth 2 --format_only --options "jsonfile_prefix=./submit"
7. Convert json output back to SKU110k csv format
python sku110k_scripts/json_out_to_submit.py --args
Experiments
1. Initial experiments
Config |
Backbone |
Lr schd |
Base lr |
imgs_p_gpu |
img_scale |
anchor_sc |
mAP |
AP@0.5 |
AP@0.75 |
AR |
Tr.mAP |
Tr.AP@0.5 |
Tr.AP@0.75 |
Tr.AR |
RetinaNet-r50-fpn |
r50 |
1x |
0.001 |
2 |
(1333, 800) |
4 (octave) |
0.463 |
0.751 |
0.532 |
0.512 |
0.467 |
0.752 |
0.535 |
0.516 |
Faster-RCNN-r50-fpn |
r50 |
1x |
0.005 |
2 |
(1333, 800) |
[8] |
0.523 |
0.850 |
0.592 |
0.582 |
0.537 |
0.862 |
0.612 |
0.594 |
2. Non-dense anchoring
Config |
Backbone |
Lr schd |
Base lr |
imgs_p_gpu |
img_scale |
anchor_sc |
4tiles |
mAP |
AP@0.5 |
AP@0.75 |
AR |
Tr.mAP |
Tr.AP@0.5 |
Tr.AP@0.75 |
Tr.AR |
GA-RetinaNet-r50-fpn |
r50 |
1x |
0.001 |
2 |
(816, 1088) |
4 (octave) |
☐ |
0.523 |
0.870 |
0.579 |
0.583 |
0.532 |
0.881 |
0.590 |
0.591 |
GA-RetinaNet-x101-32x4d-fpn |
x101-32x4d |
1x |
0.001 |
2 |
(816, 1088) |
4 (octave) |
☐ |
0.537 |
0.882 |
0.602 |
0.598 |
0.552 |
0.896 |
0.623 |
0.610 |
RepPoints-moment-r50-fpn |
r50 |
1x |
0.02 |
6 |
(816, 1088) |
4 (base) |
☐ |
0.505 |
0.815 |
0.578 |
0.562 |
0.519 |
0.820 |
0.601 |
0.574 |
3. Comparison of different anchor scales for Faster-RCNN
Config |
Backbone |
Lr schd |
Base lr |
imgs_p_gpu |
img_scale |
anchor_sc |
mAP |
AP@0.5 |
AP@0.75 |
AR |
Tr.mAP |
Tr.AP@0.5 |
Tr.AP@0.75 |
Tr.AR |
Faster-RCNN-r50-fpn |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[8] |
0.522 |
0.850 |
0.591 |
0.577 |
0.534 |
0.862 |
0.611 |
0.590 |
Faster-RCNN-r50-fpn |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[4] |
0.551 |
0.912 |
0.614 |
0.613 |
0.567 |
0.926 |
0.636 |
0.629 |
Faster-RCNN-r50-fpn |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[3] |
0.549 |
0.911 |
0.611 |
0.614 |
|
|
|
|
4. Comparison of different anchor scales for RetinaNet
Config |
Backbone |
Lr schd |
Base lr |
imgs_p_gpu |
img_scale |
anchor_sc |
mAP |
AP@0.5 |
AP@0.75 |
AR |
Tr.mAP |
Tr.AP@0.5 |
Tr.AP@0.75 |
Tr.AR |
RetinaNet-r50-fpn |
r50 |
1x |
0.001 |
2 |
(1333, 800) |
4 (octave) |
0.463 |
0.751 |
0.532 |
0.512 |
0.467 |
0.752 |
0.535 |
0.516 |
RetinaNet-r50-fpn |
r50 |
1x |
0.001 |
2 |
(1333, 800) |
3 (octave) |
0.508 |
0.849 |
0.564 |
0.569 |
0.513 |
0.853 |
0.574 |
0.574 |
5. Bells and whistles testing
Config |
Backbone |
Lr schd |
Base lr |
imgs_p_gpu |
img_scale |
anchor_sc |
4tiles |
s-nms test |
extra augs |
traintime flip |
testtime flip |
mAP |
AP@0.5 |
AP@0.75 |
AR |
Faster-RCNN-r50-fpn |
r50 |
1x |
0.005 |
2 |
(752, 1024), (816, 1088), (880, 1152) |
[4] |
☐ |
☐ |
☐ |
✓ |
☐ |
0.552 |
0.912 |
0.615 |
0.616 |
Faster-RCNN-r50-fpn |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[4] |
☐ |
☐ |
✓ |
☐ |
☐ |
0.548 |
0.911 |
0.608 |
0.612 |
Faster-RCNN-r50-fpn |
r50 |
2x |
0.005 |
2 |
(816, 1088) |
[4] |
☐ |
☐ |
✓ |
✓ |
☐ |
0.540 |
0.906 |
0.596 |
0.606 |
Faster-RCNN-r50-fpn |
r50 |
2x |
0.005 |
2 |
(816, 1088) |
[4] |
☐ |
☐ |
✓ |
✓ |
✓ |
0.510 |
0.888 |
0.543 |
0.584 |
6. Cascade-RCNN comparison
Config |
Backbone |
Lr schd |
Base lr |
imgs_p_gpu |
img_scale |
anchor_sc |
4tiles |
s-nms test |
mAP |
AP@0.5 |
AP@0.75 |
AR |
Tr.mAP |
Tr.AP@0.5 |
Tr.AP@0.75 |
Tr.AR |
Cascade-RCNN-r50-fpn |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[8] |
☐ |
☐ |
0.525 |
0.840 |
0.604 |
0.582 |
0.542 |
0.862 |
0.647 |
0.596 |
Cascade-RCNN-r50-fpn |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[4] |
☐ |
☐ |
0.553 |
0.902 |
0.626 |
0.615 |
0.574 |
0.926 |
0.653 |
0.634 |
Cascade-RCNN-r50-fpn |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[4] |
☐ |
✓ |
0.556 |
0.900 |
0.632 |
0.622 |
0.577 |
0.925 |
0.659 |
0.642 |
Cascade-RCNN-x101-32x4d-fpn |
x101-32x4d |
1x |
0.005 |
2 |
(768, 1024) |
[4] |
☐ |
☐ |
0.556 |
0.903 |
0.629 |
0.617 |
0.583 |
0.929 |
0.665 |
0.640 |
Cascade-RCNN-x101-32x4d-fpn |
x101-32x4d |
1x |
0.005 |
2 |
(768, 1024) |
[4] |
☐ |
✓ |
0.560 |
0.902 |
0.635 |
0.623 |
0.585 |
0.929 |
0.672 |
0.647 |
7. Tiling strategies
Config |
Backbone |
Lr schd |
Base lr |
imgs_p_gpu |
img_scale |
anchor_sc |
4tiles |
s-nms test |
mAP |
AP@0.5 |
AP@0.75 |
AR |
Faster-RCNN-r50-fpn (w/o merging) |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[8] |
✓ |
☐ |
0.561 |
0.912 |
0.632 |
0.628 |
Faster-RCNN-r50-fpn (w/o merging) |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[4] |
✓ |
☐ |
0.566 |
0.928 |
0.636 |
0.636 |
Faster-RCNN-r50-fpn (merged) |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[4] |
✓ |
☐ |
0.547 |
0.894 |
0.615 |
0.611 |
Faster-RCNN-r50-fpn (full frame) |
r50 |
1x |
0.005 |
2 |
(816, 1088) |
[4] |
✓ |
✓ |
0.577 |
0.928 |
0.659 |
0.654 |
Citation
Feel free to cite my report if you use any of the results for benchmarking in your work.
@misc{kozlov2020working,
title={Working with scale: 2nd place solution to Product Detection in Densely Packed Scenes [Technical Report]},
author={Artem Kozlov},
year={2020},
eprint={2006.07825},
archivePrefix={arXiv},
primaryClass={cs.CV}
}