mks0601 / 3DMPPE_ROOTNET_RELEASE

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019
MIT License
477 stars 65 forks source link

Learning rate decrease code problem #31

Closed zwithz closed 3 years ago

zwithz commented 3 years ago

Hello, I have been reviewing your paper and code (RootNet & PoseNet) for several days. I'd like to mention that the learning rate decrease code is implemented in the wrong way.

For instance, Line 77 used a local variable e, I guess that line 78-84 need to be indented by 4 spaces? https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE/blob/8bef0cd332c3423050a6f3b382d2a574623e1ffa/common/base.py#L77

Your Code

    def set_lr(self, epoch):
        for e in cfg.lr_dec_epoch:
            if epoch < e:
                break
        if epoch < cfg.lr_dec_epoch[-1]:
            idx = cfg.lr_dec_epoch.index(e)
            for g in self.optimizer.param_groups:
                g['lr'] = cfg.lr / (cfg.lr_dec_factor ** idx)
        else:
            for g in self.optimizer.param_groups:
                g['lr'] = cfg.lr / (cfg.lr_dec_factor ** len(cfg.lr_dec_epoch))

I guess this is the right code? 🤔

    def set_lr(self, epoch):
        for e in cfg.lr_dec_epoch:
            if epoch < e:
                break
            if epoch < cfg.lr_dec_epoch[-1]:
                idx = cfg.lr_dec_epoch.index(e)
                for g in self.optimizer.param_groups:
                    g['lr'] = cfg.lr / (cfg.lr_dec_factor ** idx)
            else:
                for g in self.optimizer.param_groups:
                    g['lr'] = cfg.lr / (cfg.lr_dec_factor ** len(cfg.lr_dec_epoch))

BTW, I trained/tested several times and used several protocols and datasets (Human3.6M Protocol2 / MuCo / 3DPW), but I can not reproduce the precision you have mentioned in the README file or paperwork? Maybe the problem above caused it?

mks0601 commented 3 years ago

Hi,

You can test the below codes work well

for e in range(10):
    print(e)
print(e)

This means we can refer e outside the for loop.

What is your current precision? Which datasets did you use for the training?

zwithz commented 3 years ago

This means we can refer e outside the for loop.

Okay, Python... 😳

Here you go, did you train your model based on backbone ResNet 101 or something? The following results are based on ResNet 50.

Dataset Epoch Time Speed (it/s) $AP^{BOX}$ (%)
Baseline 43.80
MuPoTS 0 4:07 1.52 19.28
1 2:31 1.07 23.41
2 2:36 1.07 24.97
3 2:32 1.07 29.92
4 2:33 1.06 31.72
5 2:35 1.05 33.29
6 2:31 1.08 31.80
7 2:32 1.07 29.69
8 2:27 1.11 34.75
9 2:20 1.16 31.70
10 2:18 1.18 31.87
11 2:21 1.15 33.80
12 2:23 1.14 35.45
13 2:21 1.15 25.34
14 2:22 1.15 34.55
15 2:20 1.16 39.24
16 2:23 1.14 36.85
17 2:22 1.14 35.46
18 2:18 1.17 33.22
19 2:20 1.16 34.09
Dataset Epoch Time Speed (it/s) $AP^{root}_{25}$ (%)
Baseline 28.50
MuPoTS 0 6:54 1.30 14.61
1 6:39 1.35 24.18
2 6:45 1.34 24.86
3 6:44 1.34 29.28
4 6:45 1.33 26.47
5 6:42 1.34 29.96
6 6:37 1.36 30.34
7 6:36 1.37 32.85
8 6:47 1.33 33.15
9 6:46 1.33 31.04
10 6:43 1.34 29.09
11 6:33 1.38 32.77
12 6:40 1.34 33.70
13 6:45 1.34 31.85
14 6:44 1.34 34.97
15 6.49 1.32 32.55
16 6:57 1.29 33.42
17 6:50 1.32 34.47
18 6:25 1.31 31.85
19 7:38 1.18 32.79
Dataset Epoch Time Speed (it/s) MRPE MRPE_x MRPE_y MRPE_z
Baseline 0.386 0.045 0.094 0.353
3DPW 0 2:53 1.60 0.563 0.070 0.143 0.504
1 2:48 1.65 0.541 0.069 0.138 0.483
2 2:48 1.65 0.491 0.061 0.116 0.448
3 2:49 1.64 0.542 0.065 0.136 0.489
4 2:48 1.65 0.519 0.059 0.122 0.475
5 2:49 1.64 0.444 0.055 0.109 0.401
6 2:48 1.65 0.448 0.057 0.106 0.407
7 2:48 1.65 0.418 0.055 0.109 0.377
8 2:50 1.63 0.478 0.055 0.112 0.438
9 2:50 1.63 0.543 0.061 0.113 0.506
10 2:46 1.67 0.491 0.060 0.114 0.451
11 2:49 1.64 0.481 0.053 0.112 0.442
12 2:49 1.64 0.495 0.056 0.107 0.459
13 2:49 1.64 0.432 0.050 0.099 0.397
14 2:48 1.65 0.503 0.055 0.098 0.470
15 2:49 1.64 0.448 0.054 0.097 0.415
16 2:48 1.65 0.440 0.055 0.096 0.407
17 2:48 1.65 0.460 0.054 0.095 0.428
18 2:50 1.63 0.462 0.054 0.097 0.429
19 3:00 1.54 0.437 0.053 0.095 0.405
Dataset Epoch Time Speed (it/s) MRPE MRPE_x MRPE_y MRPE_z Directions Discussion Eating Greeting Phoning Posing Purchases Sitting Sitting Down Smoking Photo Waiting Walking Walk Dog Walk Together
Baseline 120.00 23.3 23.0 108.1
Human3.6M 0 0:41 1.66 128.30 33.83 47.78 98.24 75.68 103.26 118.83 106.74 129.65 80.36 100.74 204.66 283.18 132.22 121.89 104.13 90.10 152.61 96.89
1 0:36 1.86 148.07 28.39 27.99 133.46 124.77 126.71 153.52 132.18 139.22 115.90 145.83 184.37 244.23 160.95 133.51 123.70 134.60 166.31 134.66
2 0:36 1.86 141.87 26.95 29.71 127.02 130.51 119.52 154.68 128.15 132.46 118.38 138.51 180.12 251.67 145.65 130.00 115.84 108.41 162.61 114.06
3 0:36 1.88 104.33 25.68 30.44 84.89 68.53 85.59 115.64 103.10 101.17 70.79 102.51 143.14 193.27 100.63 106.72 95.44 75.45 121.34 80.66
4 0:36 1.88 177.32 27.48 30.24 166.32 166.79 160.97 186.02 157.22 171.35 155.79 176.81 208.59 270.88 189.83 172.30 142.99 146.76 199.91 148.10
5 0:35 1.91 153.15 26.46 29.38 141.45 128.27 124.61 153.30 135.66 138.08 120.57 162.19 178.03 340.77 147.35 146.93 126.21 116.44 184.68 121.23
6 0:35 1.90 124.33 24.84 26.26 110.89 125.75 114.95 111.07 128.06 103.07 117.14 135.63 127.50 197.39 116.56 128.53 119.17 115.38 141.77 117.53
7 0:34 1.95 124.65 24.55 25.97 111.31 92.01 104.58 134.57 107.96 126.24 89.02 116.49 164.28 239.92 131.30 119.69 101.37 93.26 142.97 91.37
8 0:35 1.91 127.66 24.77 26.11 114.54 131.45 114.40 122.83 132.40 117.81 122.17 127.14 133.71 189.32 133.62 122.78 120.70 110.27 133.39 112.70
9 0:36 1.87 112.76 23.84 23.23 99.61 106.35 98.66 111.65 110.51 102.71 98.49 108.84 144.35 181.63 116.35 104.50 101.67 92.18 120.19 95.03
10 0:35 1.92 97.15 23.15 22.76 82.93 81.34 88.57 89.45 99.72 83.58 78.10 100.22 105.95 195.36 92.19 94.14 96.28 75.48 108.98 80.65
11 0:35 1.93 134.05 23.27 24.56 122.79 137.86 114.01 125.26 131.74 117.92 120.86 134.79 141.23 240.51 133.14 126.71 122.42 115.48 153.39 122.03
12 0:35 1.90 136.22 23.07 23.29 126.31 130.01 116.07 134.56 128.38 129.03 116.16 138.07 155.80 235.10 142.73 126.19 116.39 113.95 151.77 117.99
13 0:35 1.92 123.45 22.97 23.04 112.38 126.41 109.05 116.91 123.70 107.77 111.75 124.86 122.90 228.95 120.28 114.80 112.39 105.36 138.63 111.86
14 0:36 1.87 136.61 24.06 23.36 126.42 143.47 116.17 130.49 139.71 123.66 131.57 133.79 140.94 223.92 134.35 126.87 124.56 126.09 152.46 128.56
15 0:36 1.89 123.44 23.08 23.69 112.04 118.46 105.14 121.42 120.36 111.06 105.90 120.15 139.26 239.23 119.25 114.12 113.56 102.26 132.18 104.22
16 0:36 1.85 135.81 23.50 23.18 125.48 124.58 113.57 136.85 123.99 127.35 114.28 141.05 159.83 248.83 139.19 125.14 117.31 108.58 155.80 112.53
17 0:36 1.88 125.45 22.57 22.59 114.76 127.65 111.01 117.56 124.69 112.39 116.45 125.01 128.80 214.73 123.00 119.02 116.86 108.65 143.69 113.41
18 0:34 1.95 122.62 22.18 23.05 111.71 125.34 111.14 114.35 124.14 109.78 114.81 122.42 123.42 203.16 120.58 117.28 115.30 107.16 137.71 111.64
19 0:35 1.93 128.57 22.40 23.04 118.24 130.76 114.70 120.69 127.32 115.82 119.28 129.28 132.06 219.14 127.80 132.36 119.35 110.91 143.87 115.12
mks0601 commented 3 years ago

Please do not copy and paste all raw data.. just let me know your precision and training data. By the way, why there is AP^box? I don't train a human detection model.

zwithz commented 3 years ago

Sorry for not stating clearly, AP^{box} means use_gt_bbox=True here.

Train datasets Test dataset use_gt_bbox Your precision My best precision
MuCo + MSCOCO MuPoTS True 43.80(AP) 39.24
MuCo + MSCOCO MuPoTS False 28.5(AP) 33.70
MuCo + MSCOCO 3DPW False 0.386(MRPE) 0.418
Human36M Protocol2 + MPII Human36M False 120.0(MRPE) 97.15

All data above are trained/tested on RootNet models, the data on PoseNet is more inconsistent.

mks0601 commented 3 years ago

I see. You'd better check all snapshots, saved during the training stage. The accuracy of RootNet is not very stable due to the high depth ambiguity. However, I confirmed that the performance of PoseNet is stable. Could you let me know your PoseNet results?

zwithz commented 3 years ago

RootNet

The testing results of all the snapshots were here, and I tested several times, the results stayed almost the same. https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE/issues/31#issuecomment-795126548

PoseNet

Train datasets Test dataset Eval Metric Your precision My best precision
MuCo + MSCOCO MuPoTS Sequence-wise 3DPCK_{rel} & Accuracy for all groundtruths 81.8(Avg) 79.6
MuCo + MSCOCO MuPoTS Sequence-wise 3DPCK_{rel} & Accuracy only for matched groundtruths 82.5(Avg) 80.91
Human36M Protocol2 + MPII Human36M MPJPE 53.3 54.34

Maybe all of the differences between your precision and mine are negligible in some way? I should hack the eval program to compute the average precision for me, otherwise, there will be too many csv files to be handled. Thanks for your prompt reply! Hope you have a great day~

mks0601 commented 3 years ago

So it seems your RootNet results are better than me? MuCo + MSCOCO | MuPoTS | False | 28.5(AP) | 33.70 Human36M Protocol2 + MPII | Human36M | False | 120.0(MRPE) | 97.15

I think PoseNet results differences are not that major, but I'm not sure. Please let me know if you can't catch up my results. Thanks!