reproduce detectoRS htc x101 performance on coco

GHPAUN commented 4 years ago

Hello sir, I'm reproducing detectoRS htc x101 performance on coco. With 1 img per gpu on 16 gpus, which is the recommended setting for htc x101, I'm getting 51.4mAP; with 1 img per gpu on 32 gpus, I'm getting 52.4mAP, which is pretty close to the off-the-shelf model, but still, there's some gap. There might be some details I didn't notice. Also, the training is very time and gpu memory consuming, have you tried mmdet fp16 train and how is the performance? Thanks a lot.

we1pingyu commented 4 years ago

Hello sir, I'm reproducing detectoRS htc x101 performance on coco. With 1 img per gpu on 16 gpus, which is the recommended setting for htc x101, I'm getting 51.4mAP; with 1 img per gpu on 32 gpus, I'm getting 52.4mAP, which is pretty close to the off-the-shelf model, but still, there's some gap. There might be some details I didn't notice. Also, the training is very time and gpu memory consuming, have you tried mmdet fp16 train and how is the performance? Thanks a lot.

Hi, I want to take this opportunity to ask a question: what gpu did you use for training? Because even r50 with image scale (200,200) will cause OOM on 1080Ti. Is there any suggestion?

joe-siyuan-qiao commented 4 years ago

@GHPAUN please first try reproducing smaller models as in the branch mmdetv2. I haven't used FP16 and am not sure whether DetectoRS can be trained with it or not.

@we1pingyu We used TITAN RTX to train all models. If you have memory issue, please try cascade-based DetectoRS which requires less memory and set with_cp to True for all the backbones.

GHPAUN commented 4 years ago

@joe-siyuan-qiao thank you for your reply. I have already reproduced the smaller models, aka detecors htc r50 12e 49.1mAP, and 51.4 for 40epochs. I got the 52.9mAP using 64 gpus and 0.04lr just now, with tta I got 54.1mAP on val set. I'll try test set later. I guess it's quicker and more stable using larger batch size and larger lr. @we1pingyu I used v100

joe-siyuan-qiao commented 4 years ago

@GHPAUN 52.9 mAP on val2017 matches the performance of the provided pre-trained model. mAP's on test-dev are usually better than those on val2017. Just out of curiosity, why did you use 0.04 lr when the batch size is 64? I thought it would be 0.08 to match the linearity.

ZFTurbo commented 4 years ago

I'm trying to get 53.3 on pretrained model from repo for COCO dataset. But my result for val2017 only 51.5. Am I missing something?

Here is my code:

from mmdet.apis import init_detector, inference_detector, show_result_pyplot
import mmcv
import glob
import os

if __name__ == '__main__':
    config_file = 'DetectoRS_mstrain_400_1200_x101_32x4d_40e.py'
    checkpoint_file = 'DetectoRS_X101-ed983634.pth'
    model = init_detector(config_file, checkpoint_file, device='cuda:0')

    out = open('coco_DetectoRS_val2017_preds.csv', 'w')
    out.write('img_id,label,score,x1,x2,y1,y2\n')
    images = glob.glob('val2017/*.jpg')
    for img in images:
        id = os.path.basename(img)
        res = inference_detector(model, img)
        for i in range(80):
            for j in range(len(res[0][i])):
                x1, y1, x2, y2, prob = res[0][i][j]
                out.write('{},{},{},{:.2f},{:.2f},{:.2f},{:.2f}\n'.format(id, i, prob, x1, x2, y1, y2))

    out.close()

Then I run pycocotools and here is the result:

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.515
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.710
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.564
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.318
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.565
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.676
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.384
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.628
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.671
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.479
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.723
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.828

joe-siyuan-qiao commented 4 years ago

@ZFTurbo We used tools/dist_test.py to evaluate the pre-trained model. You can take a look at it to find out what the differences are. Thanks.

GHPAUN commented 4 years ago

@joe-siyuan-qiao yeah I thought that too, but in my experiments lr = 0.00125 batch_size is always worse than lr = 0.000625 batch_size, maybe I missed something here.

joe-siyuan-qiao / DetectoRS

reproduce detectoRS htc x101 performance on coco #46