ouusan / some-papers

0 stars 0 forks source link

Multimodal Methods #25

Open ouusan opened 1 month ago

ouusan commented 1 month ago

1.Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild(2019) In annotations used for input and supervision, dense correspondence can be more effective, or achieve same results when only keep 20 precent 3D in the wild annotations. other comparison among various kind of annotations. code: https://github.com/penincillin/DCT_ICCV-2019 2.Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation(2021) two-stream: optical flow can improve smoothness of human motion, temporal network use transformer instead of GRU(temporal information is inevitably lost). Flow supervision. code:No 3.LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic Occlusion-Aware Data and Neural Mesh Rendering(2021) viewpoint augmentation strategy( modify the orientation of the body relative to the camera) to generate intermediate representations(Synthetic Data Generation), Differentiable rendering and silhouette loss, full losses are adaptively combined using homoscedastic uncertainty code: https://github.com/iGame-Lab/LASOR 4.CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation(2022) cropped image+bounding box information(encodes its discarded location and size in the original full image), 2D reprojection loss with a broader view of the full frame(project 3D joints onto the full image,not cropped) code: https://github.com/huawei-noah/noah-research/tree/master/CLIFF Mindspore. https://www.mindspore.cn

ouusan commented 1 month ago
  1. obtain optical flow by 2-21: SelFlow: Self-Supervised Learning of Optical Flow(2019) https://arxiv.org/pdf/1904.09117 code: https://github.com/ppliuboy/SelFlow To see how motion discriminator works in code: following 2-17VIBE: https://github.com/mkocabas/VIBE Flow supervision inspired by 2-11 VCNN is pretrained on 2-18 ,CCNN is pretrained on Imagenet.
  2. 3-33 homoscedastic uncertainty Multi-task learning using uncertainty to weigh losses for scene geometry and semantics reference : https://blog.csdn.net/qq_43592352/article/details/124715562(??) related: https://arxiv.org/abs/2009.10013 Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild image

and code: https://github.com/akashsengupta1997/STRAPS-3DHumanShapePose

  1. Pseudo-GT annotators previous CNN-based pseudo-GT annotators: 4-17,4-25,4-37 CLIFF-based annotator, use the SMPL parameter predictions by the pretrained CLIFF annotator as an effective explicit prior.

An extra model such as GMM [4], GAN [18,9] or VAE [43,22] is trained on a large motion capture dataset AMASS [33] to be an implicit prior. Other methods search for plausible SMPL parameters that may be close to the ground truth to be an explicit prior [39,13].