idiap / potr

GNU General Public License v3.0
35 stars 8 forks source link

False experiments, label leaking #14

Closed mousecpn closed 1 year ago

mousecpn commented 2 years ago

It seems that there is something wrong with the code. In PoseTransformer.py, line 317, you add the target_seq as the input in the final prediction.

image

It uses the whole sequence instead of only X_T step. so If I change the code as follows:

image

the performance becomes:

image

It seems the model does nothing but input the target to output the same target. Could you explain that?

ZhouYuxuanYX commented 1 year ago

I could confirm this issue too: the argument "--query_selection" is indeed set to False, i.e., the ground truth is directly added to the model prediction! When the argument "--query_selection" is set to True, model performance is significantly worse than reported.

milliseconds | 80 | 160 | 320 | 400 | 560 | 1000 | directions | 0.394 | 0.594 | 0.789 | 0.887 | 1.020 | 1.495 | discussion | 0.311 | 0.667 | 0.942 | 1.037 | 1.414 | 1.876 | eating | 0.278 | 0.485 | 0.737 | 0.882 | 1.067 | 1.423 | greeting | 0.543 | 0.888 | 1.304 | 1.492 | 1.786 | 1.797 | phoning | 0.637 | 1.210 | 1.655 | 1.827 | 1.814 | 2.107 | posing | 0.697 | 0.934 | 1.452 | 1.673 | 2.113 | 3.063 | purchases | 0.617 | 0.883 | 1.195 | 1.270 | 1.641 | 2.452 | sitting | 0.398 | 0.633 | 1.029 | 1.198 | 1.310 | 1.652 | sittingdown | 0.393 | 0.739 | 1.071 | 1.191 | 1.357 | 1.810 | smoking | 0.278 | 0.504 | 0.987 | 1.037 | 1.117 | 1.760 | takingphoto | 0.247 | 0.507 | 0.788 | 0.917 | 1.028 | 1.271 | waiting | 0.341 | 0.666 | 1.218 | 1.474 | 1.888 | 2.634 | walking | 0.394 | 0.678 | 0.871 | 0.971 | 1.232 | 1.202 | walkingdog | 0.603 | 0.981 | 1.357 | 1.512 | 1.760 | 1.988 | walkingtogether | 0.334 | 0.661 | 0.913 | 0.951 | 0.939 | 1.457 |

Is it possible that I might have misunderstood the setup?

joyfang1106 commented 1 year ago

I also met this issue...

legan78 commented 1 year ago

It seems that there is something wrong with the code. In PoseTransformer.py, line 317, you add the target_seq as the input in the final prediction.

image

It uses the whole sequence instead of only X_T step. so If I change the code as follows:

image

the performance becomes:

image

It seems the model does nothing but input the target to output the same target. Could you explain that?

Hi,

target_pose is a tensor of shape [bs, tgt_seq_dim, pose_dim] that contains a copy of X_T pose in all its elements.Check how the copy of X_T is done in the dataset loader in the following line when --pad_decoder_inputs is passed.

https://github.com/idiap/potr/blob/1e194130cd012d3c9a6137052f402e6f7fb5b71b/data/H36MDataset_v2.py#L298

end is computed based on the number of dimensions the body pose vector.

Please check the provided training configuration parameters in the main page of the project. https://github.com/idiap/potr#training

legan78 commented 1 year ago

I could confirm this issue too: the argument "--query_selection" is indeed set to False, i.e., the ground truth is directly added to the model prediction! When the argument "--query_selection" is set to True, model performance is significantly worse than reported.

milliseconds | 80 | 160 | 320 | 400 | 560 | 1000 | directions | 0.394 | 0.594 | 0.789 | 0.887 | 1.020 | 1.495 | discussion | 0.311 | 0.667 | 0.942 | 1.037 | 1.414 | 1.876 | eating | 0.278 | 0.485 | 0.737 | 0.882 | 1.067 | 1.423 | greeting | 0.543 | 0.888 | 1.304 | 1.492 | 1.786 | 1.797 | phoning | 0.637 | 1.210 | 1.655 | 1.827 | 1.814 | 2.107 | posing | 0.697 | 0.934 | 1.452 | 1.673 | 2.113 | 3.063 | purchases | 0.617 | 0.883 | 1.195 | 1.270 | 1.641 | 2.452 | sitting | 0.398 | 0.633 | 1.029 | 1.198 | 1.310 | 1.652 | sittingdown | 0.393 | 0.739 | 1.071 | 1.191 | 1.357 | 1.810 | smoking | 0.278 | 0.504 | 0.987 | 1.037 | 1.117 | 1.760 | takingphoto | 0.247 | 0.507 | 0.788 | 0.917 | 1.028 | 1.271 | waiting | 0.341 | 0.666 | 1.218 | 1.474 | 1.888 | 2.634 | walking | 0.394 | 0.678 | 0.871 | 0.971 | 1.232 | 1.202 | walkingdog | 0.603 | 0.981 | 1.357 | 1.512 | 1.760 | 1.988 | walkingtogether | 0.334 | 0.661 | 0.913 | 0.951 | 0.939 | 1.457 |

Is it possible that I might have misunderstood the setup?

Hi,

The flag query_selection introduces a new loss that attempts to learn the best possible pose selections to query the decoder with instead of always selecting X_T, check:

https://github.com/idiap/potr/blob/1e194130cd012d3c9a6137052f402e6f7fb5b71b/training/seq2seq_model_fn.py#L507

However, this feature is not tested nor supported well in the current version of the model.

Please use the same configuration for training shown in the main page of the project https://github.com/idiap/potr#training