ucas-vg / TOV_mmdetection

Include mmdetection version of TinyBenchmark. Official link.
https://github.com/ucas-vg/TinyBenchmark
Apache License 2.0
34 stars 2 forks source link

Reproduce SM on tinyperson dataset #16

Open Hshuqin opened 2 years ago

Hshuqin commented 2 years ago

Hello, I tried to run a few algorithms under that repository, but the performance of retinanet and fcos is better than sm(iou_thrs=[0.25, 0.5, 0.75]), I don't know if there is a problem with my configuration file, can you help me look at it? Or which parameters are not modified correctly? configs file: TOV_mmdetection-main/configs2/TinyPerson/scale_match/retinanet_r50_fpns4_1x_coco_sm_tinyperson.py

A few important changes are as follows. Data: data = dict( samples_per_gpu=4, # 2 workers_per_gpu=1, train=dict( type=dataset_type,

ann_file=data_root + 'erase_with_uncertain_dataset/annotations/corner/task/tiny_set_train_sw640_sh512_all.json',

        ann_file=data_root + 'mini_annotations/tiny_set_train_sw640_sh512_all_erase.json',  # same as last line
        img_prefix=data_root + 'erase_with_uncertain_dataset/train/',
        pipeline=train_pipeline,
        # train_ignore_as_bg=False,
),
val=dict(
    type=dataset_type,
    # ann_file=data_root + 'annotations/corner/task/tiny_set_test_sw640_sh512_all.json',
    ann_file=data_root + 'mini_annotations/tiny_set_test_all.json',

    img_prefix=data_root + 'test/',
    pipeline=test_pipeline),
test=dict(
    type=dataset_type,
    # ann_file=data_root + 'annotations/corner/task/tiny_set_test_sw640_sh512_all.json',
    ann_file=data_root + 'mini_annotations/tiny_set_test_all.json',

    img_prefix=data_root + 'test/',
    pipeline=test_pipeline)
    )

Evaluation: evaluation = dict(interval=1, metric='bbox', iou_thrs=[0.25, 0.5, 0.75], # set None mean use 0.5:1.0::0.05 proposal_nums=[200], cocofmt_kwargs=dict( ignore_uncertain=True, use_ignore_attr=True, use_iod_for_ignore=True, iod_th_of_iou_f="lambda iou: iou", #"lambda iou: (2*iou)/(1+iou)", cocofmt_param=dict( evaluate_standard='tiny', # or 'coco'

iouThrs=[0.25, 0.5, 0.75], # set this same as set evaluation.iou_thrs

        # maxDets=[200],              # set this same as set evaluation.proposal_nums
    )
))

In the test pipeline, the img_scale was modified:

img_scale=(333, 200),

    img_scale=(640, 512),

In the train pipline, anno_file was modified: anno_file="/home/xxxxx/data/tiny_set/mini_annotations/tiny_set_train_all_erase.json",

Other configurations follow the original settings. Looking forward to your suggestions

Vivek-23-Titan commented 2 years ago

Hi Hshuqin, I am also doing some experiments with different config files. I would like to know what was the maximum mAP @tiny50 that you were able to achieve?

Hshuqin commented 2 years ago

Hello, my evaluate_standard='tiny', so it seems that in my results there is no mAP @tiny50 as you said. my evaluation results have @., tiny1, tiny2, tiny3, small and all, where iou=0.25 for retinanet's AP@ all = 70.50, @. = 52.50 for FCOS, and @.*** = 62.22. But in theory, the performance of SM is better than that of retinanet, no matter how I adjust the parameters can not be improved

------------------ 原始邮件 ------------------ 发件人: "ucas-vg/TOV_mmdetection" @.>; 发送时间: 2021年11月9日(星期二) 上午10:32 @.>; @.**@.>; 主题: Re: [ucas-vg/TOV_mmdetection] Reproduce SM on tinyperson dataset (Issue #16)

Hi Hshuqin, I am also doing some experiments with different config files. I would like to know what was the maximum mAP @tiny50 that you were able to achieve?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

yinglang commented 2 years ago

Can you provide the following information?

Vivek-23-Titan commented 2 years ago

Sorry, maybe I wasn't precise enough. I meant the following mAP:

Average Precision (AP) @[ IoU=0.50 | area= tiny | maxDets=1000 ]

Hshuqin commented 2 years ago

@.*** in iou=0.5 retinanet:47.10 SM:41.55 I was running retinanet used TOV_mmdetection-main/configs2/TinyPerson/base/retinanet_r50_fpns4_1x_TinyPerson640.py, and at that time I also made changes to reproduce according to the suggestions. 

------------------ 原始邮件 ------------------ 发件人: "ucas-vg/TOV_mmdetection" @.>; 发送时间: 2021年11月9日(星期二) 中午11:24 @.>; @.**@.>; 主题: Re: [ucas-vg/TOV_mmdetection] Reproduce SM on tinyperson dataset (Issue #16)

Can you provide the following information?

the performance of @.*** in iou=0.5 that is the main result used to compare. (with and without SM)

do you re-run retinanet with the checkpoint trained from retinanet_r50_fpns4_1x_coco_sm_tinyperson.py as configs2/TinyPerson/scale_match/ScaleMatch_TinyPerson.sh suggested.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

yinglang commented 2 years ago
  1. OK, thanks, there should be some problems for SM. But the basic retinanet should be right. We also find the number of GPU and batch size may Bring some performance shaking for retinanet.

  2. Can you provide how many GPUs are you used?

  3. SM have two step:

    • train SM COCO: training COCO to prepare the pretrained SM.
    • train on TinyPerson: load SM COCO pretrained weight to train on TinyPerson.

And how many GPUS used during training SM COCO? Can you given the performance of COCO val which should be printed while you train SM COCO.

Vivek-23-Titan commented 2 years ago

Okay cool! Thanks @Hshuqin

Also, @yinglang till now, I can only replicate the results for Faster RCNN-FPN (exp 2.1) but not for other configurations.

As given in the detector results, the Faster RCNN-FPN SM achieves 50.85 (exp 4.0) and the Adap Retinanet-c (exp 5.1) gets 51.78 mAP_{50}^{tiny}.

So what is the correct way to follow the experiments to replicate the above mAP_50^{tiny} results?

yinglang commented 2 years ago

@Vivek-23-Titan Do you run with the same setting as corresponding *.sh file giving? Can you provide the performance of all experiemnt you have run with expx.x tag? So we can give a detail annalysis. Thanks very much.

Vivek-23-Titan commented 2 years ago

@yinglang just to make sure if I want to run exp4.0 then I can directly do the 2nd step i.e., train on TinyPerson with the Pretrain COCO under directory FPN_SM_tinyperson_b4 (instead of lastest.pth from the 1st step as you mentioned above)?

yinglang commented 2 years ago

The both of two steps should be run.

Does the FPN_SM_tinyperson_b4 is the pretrained weight come from the old TinyBenchmark version. If it is. Maybe it is the key point of the problem that can not reproduce. I have not try training with these weights.

Maybe I need to upload the new pretrained weight for this mmdetection version if you need.

Vivek-23-Titan commented 2 years ago

Yes, the FPN_SM_tinyperson_b4 is from the old TinyBenchmark version. It would be really helpful if you would provide the new pre-trained weights for exp4.0 and exp5.1.

I would like to know what is the total time required for these scaled pre-trained COCO (assuming 2017) weights with 2 GPUs as given in the exp (or more if you have tried that).

Also, how to accurately replicate the results if I am using suppose 4 or 8 GPUs instead of 2 GPUs as given in the exps (like scaling the lr in proportion to the number of GPUs)?

Hshuqin commented 2 years ago

I use two gpu, batchsize is 8, learning rate is 0.005, in fact, I use 0.01 and 0.0025 effect is not good

------------------ 原始邮件 ------------------ 发件人: "ucas-vg/TOV_mmdetection" @.>; 发送时间: 2021年11月9日(星期二) 中午11:48 @.>; @.**@.>; 主题: Re: [ucas-vg/TOV_mmdetection] Reproduce SM on tinyperson dataset (Issue #16)

OK, thanks, there should be some problems for SM. But the basic retinanet should be right. Can you provide how many GPUs are you used? We also find the number of GPU and batch size may Bring some performance shaking.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

yinglang commented 2 years ago

The link of the weights have upload as here said. But I'am not very sure that them are correct completely due to these experiments were runung long time ago. And for now, I have not enough time to re-run these experiments. So if there is any problem. Just let me konw, I will try my best to fix them.

For the setting about using different number of GPUS, normally, it is same as the paper "Bag of Tricks for Image Classification with Convolutional Neural Networks" said.
Linear scaling learning rate: learning_rate / (samples_per_gpu x num_gpus) should be fixed while compare.

Unfortunately, for RetinaNet, even Linear scaling learning rate was applied, the performance still shaking in out experiments. We don't know why that happend.

Yes, the FPN_SM_tinyperson_b4 is from the old TinyBenchmark version. It would be really helpful if you would provide the new pre-trained weights for exp4.0 and exp5.1.

I would like to know what is the total time required for these scaled pre-trained COCO (assuming 2017) weights with 2 GPUs as given in the exp (or more if you have tried that).

Also, how to accurately replicate the results if I am using suppose 4 or 8 GPUs instead of 2 GPUs as given in the exps (like scaling the lr in proportion to the number of GPUs)?

Hshuqin commented 2 years ago

ok,thanks. By the way, I'd like to know if there is a code for your team's sm+?

------------------ 原始邮件 ------------------ 发件人: "ucas-vg/TOV_mmdetection" @.>; 发送时间: 2021年11月9日(星期二) 晚上10:25 @.>; @.**@.>; 主题: Re: [ucas-vg/TOV_mmdetection] Reproduce SM on tinyperson dataset (Issue #16)

The link of the weights have upload as here said. But I'am not very sure that them are correct completely due to these experiments were runung long time ago. And for now, I have not enough time to re-run these experiments. So if there is any problem. Just let me konw, I will try my best to fix them.

For the setting about using different number of GPUS, normally, it is same as the paper "Bag of Tricks for Image Classification with Convolutional Neural Networks" said. Linear scaling learning rate: learning_rate / (samples_per_gpu x num_gpus) should be fixed while compare.

Unfortunately, for RetinaNet, even Linear scaling learning rate was applied, the performance still shaking in out experiments. We don't know why that happend.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Vivek-23-Titan commented 2 years ago

Thanks a lot for the info @yinglang! The new weight for faster_rcnn_r50_fpn_1x_coco_sm_tinyperson_lr0.01_8b2g_latest.pth worked like a charm!

However, I had tried running exp 5.1 with old weights and it seemed to run fine but with the new weight retinanet_r50_fpns4_1x_coco_sm_tinyperson_lr0.01_4b2g_latest.pth , it throws an error stating: issue

And when I searched for this issue this was the response: In my case, this error was caused by a corrupted saved file. So I switch to older checkpoints and the problem is gone.

(Reference: https://github.com/pytorch/pytorch/issues/31620#issuecomment-716950574)

Can you please check if there is some issue with the model weights or perhaps loading/saving it?