Unfair comparison with SimpleBaseline and SimpleBaseline-darkpose

yangsenius / TransPose

PyTorch Implementation for "TransPose: Keypoint localization via Transformer", ICCV 2021.

https://github.com/yangsenius/TransPose/releases/download/paper/transpose.pdf

MIT License

353 stars 56 forks source link

Unfair comparison with SimpleBaseline and SimpleBaseline-darkpose #1

Closed HuangJunJie2017 closed 3 years ago

HuangJunJie2017 commented 3 years ago

Nice pioneer~ But it is not fair to compare TransPose-R with the original SimpleBaseline-Res, as the original SimpleBaseline use a weaker augmentation strategy and a shorter training schedule. This will lead to a large gap of 1.5 AP. I look forward to a fair comparison between them for reference.

yangsenius commented 3 years ago

Thanks for your interest!

We use the same data augmentation strategy (following the HRNet code) for all our models. The long training schedule is a necessary setting for training our transformer-based models. Thank your suggestion. We would consider controlling the data augmentation strategy to be the same when comparing with SimpleBaseline.

HuangJunJie2017 commented 3 years ago

emmm, under this situation, an ablation study is appealing and necessary for making this fair. This also make me curious that do the transformer-based models perform very poor when using the aug-schedule configuration in the original SimpleBaseline? (drop more than 1.5 AP, may be 2 or 3 AP) As the application of transformer in DERT also requires a longer schedule~

yangsenius commented 3 years ago

Hi, @HuangJunJie2017 . We did not train our models using the aug-schedule configuration of the original SimpleBaseline.

We provide the convergence curves and log of the mAP for TransPose-R-A4. The curves are as below:

Screenshot from 2021-01-27 10-54-49

We also use the same data augmentation strategy of TransPose-R,

DATASET:
  COLOR_RGB: true
  DATASET: 'coco'
  DATA_FORMAT: jpg
  FLIP: true
  NUM_JOINTS_HALF_BODY: 8
  PROB_HALF_BODY: 0.3
  ROOT: 'data/coco/'
  ROT_FACTOR: 45
  SCALE_FACTOR: 0.35
  TEST_SET: 'val2017'
  TRAIN_SET: 'train2017'

and a longer training schedule (203 epochs), to train the original SimpleBaseline-Res50+DarkPose, it now brings 0.4 AP gain (72.0->72.4) on coco val datset with the same detected boxes (56.4AP).

Arch	AP	Ap .5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
pose_resnet	0.724	0.894	0.796	0.685	0.795	0.779	0.932	0.841	0.734	0.844

HuangJunJie2017 commented 3 years ago

emmm, i have not try SimpleBaseline-Res50+darkpose+hrnetaugschedule. But i have tried another similar configuration SimpleBaseline-Res50+UDP+hrnetaugschedule. This configuration scores 73.4 AP on coco val datset with the same detected boxes (56.4AP). 72.0 for SimpleBaseline-Res50+darkpose+hrnetaugschedule is too low. There may be some problem in your experiment which may has negative impact on both SimpleBaseline and TransPose? TransPose can perform better?

yangsenius commented 3 years ago

We run the official DarkPose code on our machine. Here we provide the history log files.

orignal_config_baseline_res50_256x192_d256x3_adam_lr1e-3_2020-11-10-13-01_train.log

orignal_config_baseline_res50_256x192_d256x3_adam_lr1e-3_2020-11-11-20-52_valid.log

HuangJunJie2017 commented 3 years ago

emm, it it seem that you don't train SimpleBaseline-Res50+darkpose+hrnetaugschedule this with 210 epoch, but 140 epoch instead. This make this configuration perform poor (72.0) ~, if you train this with 210 epoch, SimpleBaseline-Res50+darkpose+hrnetaugschedule will score more than 73.0 AP coco val datset with the same detected boxes (56.4AP).

yangsenius commented 3 years ago

A longer scheduler or stronger augmentation indeed increases the AP. But, the original SimpleBaseline-Res50 and SimpleBaseline-Res50+DarkPose are trained with 140 epochs. Due to that we use the data augmentation of HRNet for TransPose-R, so we trained SimpleBaseline-Res50+DarkPose with stronger data augmentation (HRNet) for a fair comparison. In addition, the current result I reported (204 epochs) is 72.4 AP now. Maybe a better result will be achieved by adding training steps, but we don't think it is a necessity that we should use the same training schedule to train such two different models (FCN vs CNN+Transformer) for comparison.

HuangJunJie2017 commented 3 years ago

Point taken, the comparison between SimpleBaseline, SimpleBaseline-darkpose and TransPose-R is unfair, but the comparison between TransPose-R and TransPose-H is fair.