zxz267 / HaMuCo

[ICCV 2023] HaMuCo: Hand Pose Estimation via Multiview Collaborative Self-Supervised Learning
https://zxz267.github.io/HaMuCo/
MIT License
42 stars 4 forks source link

Does this method apply to human pose estimation? #1

Closed SuperheadNick closed 6 months ago

SuperheadNick commented 8 months ago

Thanks for your excellent work! Here I have one question to ask: Does this method apply to human pose estimation? I tried to train the model with H36M dataset, but severe overfitting seems to occur.

zxz267 commented 8 months ago

Thanks for your interest in our work. We attempted to employ our approach for human pose estimation and achieved comparable performance to EpipolarPose and CanonPose. Have you experimented with integrating the body prior (e.g., Vposer) for regularization? As far as I recall, it's one of the main adjustments when applying HaMuCo to human pose estimation.

SuperheadNick commented 8 months ago

Thanks for your answer. I tried to integrate Vposer for regularization as you said. However, I trained the whole framework for 10 epochs and got the result like this (the former is the training result and the latter is the validation result): 1703734158060 1703734182567 It seems overfitting happens again (especially pose_xyz_0), Have you ever experienced this? Does this have anything to do with the weight setting of my regularized loss function, currently I still set it to 0.01 as given in your code. Additionally, I previously trained a version that did not utilize vposer but only smpl for regularization, I trained it for 30 epochs, and the results were as follows. 1703733579690 1703733594847

zxz267 commented 8 months ago

Thanks for your answer. I tried to integrate Vposer for regularization as you said. However, I trained the whole framework for 10 epochs and got the result like this (the former is the training result and the latter is the validation result): 1703734158060 1703734182567 It seems overfitting happens again (especially pose_xyz_0), Have you ever experienced this? Does this have anything to do with the weight setting of my regularized loss function, currently I still set it to 0.01 as given in your code. Additionally, I previously trained a version that did not utilize vposer but only smpl for regularization, I trained it for 30 epochs, and the results were as follows. 1703733579690 1703733594847

Due to the limited number of views on H36M and the discrepancies between the body and hands, there may be a potential issue of overfitting without further adjustments. Hence, it would be advisable to experiment with various weight settings. By the way, could you please elaborate on how you integrate VPoser into the framework? Furthermore, which off-the-shelf estimator are you employing? If memory serves me correctly, utilizing AlphaPose yielded superior performance in comparison to OpenPose.

SuperheadNick commented 8 months ago

My current approach is to feed the model-predicted pose to the encoder of a pre-trained Vposer model and use L1loss to constrain the distribution of the pose in this latent space to conform to a Gaussian distribution, but doing so doesn't seem to work as well as expected. May I ask if this is how you were using vposer? Also, I am currently using the 2D ground truth given in the Human3.6M dataset directly, and have not used any other 2D pose estimators as of yet.

zxz267 commented 8 months ago

My current approach is to feed the model-predicted pose to the encoder of a pre-trained Vposer model and use L1loss to constrain the distribution of the pose in this latent space to conform to a Gaussian distribution, but doing so doesn't seem to work as well as expected. May I ask if this is how you were using vposer? Also, I am currently using the 2D ground truth given in the Human3.6M dataset directly, and have not used any other 2D pose estimators as of yet.

I used the Vposer as the decoder (modify the network to estimate the latent code of VPoser), similar to [1]. Also, I employed the regularization, as mentioned in [1]. [1] Moon, Gyeongsik, et al. "Three Recipes for Better 3D Pseudo-GTs of 3D Human Mesh Estimation in the Wild." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.