ouusan / some-papers

0 stars 0 forks source link

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation #14

Open ouusan opened 4 months ago

ouusan commented 4 months ago
  1. (Threshold-Adaptive Loss Scaling) relaxes the keypoint matching constraint --->Introduce a new problem: Predicting 3D pose from 2D keypoints is fundamentally ambiguous--->Seek an unbiased prior that restricts the network to only output valid poses but does not bias it to any particular pose (use Vector Quantized-VAE to learns an encoding of 3D pose in a discrete representation by pre-training on extensive motion capture datasets, since VQ-VAE[1]’s are designed to represent a uniform prior, this reduces the biases caused by previous pose priors). We noticed this similar method using VQ-VAE[2].
image

Loss for training VQ-VAE:

image image

Loss for training TokenHMR:

image image

this paper link: https://arxiv.org/abs/2404.16752 [1] (VQ-VAE) Neural Discrete Representation Learning https://arxiv.org/abs/1711.00937 [2] T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations https://arxiv.org/abs/2301.06052 code link: https://github.com/saidwivedi/TokenHMR : the full code will be released by end of May.