Closed mousecpn closed 1 year ago
I could confirm this issue too: the argument "--query_selection" is indeed set to False, i.e., the ground truth is directly added to the model prediction! When the argument "--query_selection" is set to True, model performance is significantly worse than reported.
milliseconds | 80 | 160 | 320 | 400 | 560 | 1000 | directions | 0.394 | 0.594 | 0.789 | 0.887 | 1.020 | 1.495 | discussion | 0.311 | 0.667 | 0.942 | 1.037 | 1.414 | 1.876 | eating | 0.278 | 0.485 | 0.737 | 0.882 | 1.067 | 1.423 | greeting | 0.543 | 0.888 | 1.304 | 1.492 | 1.786 | 1.797 | phoning | 0.637 | 1.210 | 1.655 | 1.827 | 1.814 | 2.107 | posing | 0.697 | 0.934 | 1.452 | 1.673 | 2.113 | 3.063 | purchases | 0.617 | 0.883 | 1.195 | 1.270 | 1.641 | 2.452 | sitting | 0.398 | 0.633 | 1.029 | 1.198 | 1.310 | 1.652 | sittingdown | 0.393 | 0.739 | 1.071 | 1.191 | 1.357 | 1.810 | smoking | 0.278 | 0.504 | 0.987 | 1.037 | 1.117 | 1.760 | takingphoto | 0.247 | 0.507 | 0.788 | 0.917 | 1.028 | 1.271 | waiting | 0.341 | 0.666 | 1.218 | 1.474 | 1.888 | 2.634 | walking | 0.394 | 0.678 | 0.871 | 0.971 | 1.232 | 1.202 | walkingdog | 0.603 | 0.981 | 1.357 | 1.512 | 1.760 | 1.988 | walkingtogether | 0.334 | 0.661 | 0.913 | 0.951 | 0.939 | 1.457 |
Is it possible that I might have misunderstood the setup?
I also met this issue...
It seems that there is something wrong with the code. In PoseTransformer.py, line 317, you add the target_seq as the input in the final prediction.
It uses the whole sequence instead of only X_T step. so If I change the code as follows:
the performance becomes:
It seems the model does nothing but input the target to output the same target. Could you explain that?
Hi,
target_pose
is a tensor of shape [bs, tgt_seq_dim, pose_dim]
that contains a copy of X_T
pose in all its elements.Check how the copy of X_T is done in the dataset loader in the following line when --pad_decoder_inputs
is passed.
end
is computed based on the number of dimensions the body pose vector.
Please check the provided training configuration parameters in the main page of the project. https://github.com/idiap/potr#training
I could confirm this issue too: the argument "--query_selection" is indeed set to False, i.e., the ground truth is directly added to the model prediction! When the argument "--query_selection" is set to True, model performance is significantly worse than reported.
milliseconds | 80 | 160 | 320 | 400 | 560 | 1000 | directions | 0.394 | 0.594 | 0.789 | 0.887 | 1.020 | 1.495 | discussion | 0.311 | 0.667 | 0.942 | 1.037 | 1.414 | 1.876 | eating | 0.278 | 0.485 | 0.737 | 0.882 | 1.067 | 1.423 | greeting | 0.543 | 0.888 | 1.304 | 1.492 | 1.786 | 1.797 | phoning | 0.637 | 1.210 | 1.655 | 1.827 | 1.814 | 2.107 | posing | 0.697 | 0.934 | 1.452 | 1.673 | 2.113 | 3.063 | purchases | 0.617 | 0.883 | 1.195 | 1.270 | 1.641 | 2.452 | sitting | 0.398 | 0.633 | 1.029 | 1.198 | 1.310 | 1.652 | sittingdown | 0.393 | 0.739 | 1.071 | 1.191 | 1.357 | 1.810 | smoking | 0.278 | 0.504 | 0.987 | 1.037 | 1.117 | 1.760 | takingphoto | 0.247 | 0.507 | 0.788 | 0.917 | 1.028 | 1.271 | waiting | 0.341 | 0.666 | 1.218 | 1.474 | 1.888 | 2.634 | walking | 0.394 | 0.678 | 0.871 | 0.971 | 1.232 | 1.202 | walkingdog | 0.603 | 0.981 | 1.357 | 1.512 | 1.760 | 1.988 | walkingtogether | 0.334 | 0.661 | 0.913 | 0.951 | 0.939 | 1.457 |
Is it possible that I might have misunderstood the setup?
Hi,
The flag query_selection
introduces a new loss that attempts to learn the best possible pose selections to
query the decoder with instead of always selecting X_T
, check:
However, this feature is not tested nor supported well in the current version of the model.
Please use the same configuration for training shown in the main page of the project https://github.com/idiap/potr#training
It seems that there is something wrong with the code. In PoseTransformer.py, line 317, you add the target_seq as the input in the final prediction.
It uses the whole sequence instead of only X_T step. so If I change the code as follows:
the performance becomes:
It seems the model does nothing but input the target to output the same target. Could you explain that?