Closed juxuan27 closed 2 years ago
Hi @juxuan27, sorry for the late reply. I just finished a due. The experiment is very insightful and inspiring. I'm so glad to discuss with you.
In the results from alphapose, some joints will be lost. These results are not good supervision for network adaptation, especially for the online adaptation. Top-down methods are more appropriate indeed. And I think the assumption on the Gaussian noise is not very appropriate for results from bottom-up methods.
Also, the annotation gap will also influence the evaluation results.
The hyperparameters also should be tuned.
''I calculate the mean-variance of ground truth 2d and Alphapose predicted 2d, and the result is 12.65. '' Will you ignore the joint if it is missed?
Thank you for your reply! To calculate the mean-variance of ground truth 2d and Alphapose predicted 2d, I have filtered out the missed and mismatched joints. But maybe the result is not completely accurate since Alphapose sometimes outputs more than 1 person's annotation for a single person image. In this situation, I calculate the one with minimum MPJPE. Also, I have found the mean value of ground truth 2d and Alphapose predicted 2d is about 0(maybe -0.0xxx, I forget the exact number). I agree with your idea that the results of Alphapose should not be assumpted as Gaussian noise. Because if it is Gaussian noise, the mean-variance of ground truth 2d and Alphapose predicted 2d is supposed to be between 1 and 1.5. I don't know whether there is a possibility that when input 2d ground truth of the present image, the model tends to overfit on the 2d annotation instead of the temporal information. Note that the lower-level optimization step and upper-level optimization step all have 2d ground truth in the loss function. What's more, I think maybe there is a need to conduct experiments on freezing the model after fine-tuning on the 3DPW train set, then directly inference on the 3DPW train set. This may help us further understand how it works 😄
The hyperparameters also should be tuned.
Also, the annotation gap will also influence the evaluation results.
I agree hahhhh
I don't know whether there is a possibility that when input 2d ground truth of the present image, the model tends to overfit on the 2d annotation instead of the temporal information. Note that the lower-level optimization step and upper-level optimization step all have 2d ground truth in the loss function. What's more, I think maybe there is a need to conduct experiments on freezing the model after fine-tuning on the 3DPW train set, then directly inference on the 3DPW train set. This may help us further understand how it works 😄
Hi, I just finished the Spring Festival holiday. For point 1, refer to tab.7 (our paper), with temporal constraint, MPJPE and PVE have more significant improvement. And these metrics are tightly related to temporal correlation. So, I think with bilevel optimization, temporal and single-frame(gt kp2d) are auxiliary constraints. But, I agree single-frame is more important.
For point 2, if I finetune on the 3DPW training set, should GT 3D mesh/joints be used? In tab.4, I fine-tuned SPIN on 3DPW test set with GT 2D keypoints (termed as *SPIN). Also, I compared with other baselines which are fine-tuned on 3DPW training set. Please refer to this table for more details.
As for analyzing the noise distribution of each joint, I think using the results detected by AlphaPose is not appropriate. How to deal with missed joints is hard. Top-down methods may be a more appropriate alternative. This is my guess. Maybe we can chat by email or WeChat(shuishiguanshanyan).
Thank you for your answer! I've added your WeChat!
Thank you so much for your excellent work! But I got some problems while trying to test the model on predicted 2D keypoints(using Alphapose-Fast Pose, same as the backbone mentioned in README) on the dataset 3dpw This is how I tried:
The final results are as follows (plus with MPJPE on X, Y, Z axis):
I was quite confused why the results would be so bad. So I tried to make Gaussian Perturbation on ground truth 2d. And run the 3dpw baseline. The code I changed is as follows.
https://github.com/syguan96/DynaBOA/blob/b8d2bbe9d8e827a36e72bb324a9a6e43f421ae31/boa_dataset/pw3d.py#L58
changed to (e.g. sigma=1):
And here is the result:
Furthermore, I calculate the mean-variance of ground truth 2d and Alphapose predicted 2d, and the result is 12.65. Take the assumption detected 2d is Gaussian noise added on ground truth 2d, the result is supposed to be worse.
So is that mean DynaBOA is not combination incorporable with detected 2d keypoints? Or is that because of my improper operation?
Thank you so much for your patience in reading my issue.