TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

Motivation
1. Observe that the more accurate a method is on fitting 2D keypoints, the less accurate it is at predicting 3D pose due to the common weak-perspective camera assumption (both 3d-to-2d keypoints reprojection and p-GT training data suffers from this inaccurate camera model)
2. Estimating the camera from a single image is highly ill-posed, remains a challenging, unsolved, problem--->Reduce the impact of using the wrong camera model, can be applied to any HPS regression method.
Key Ideas
1. 2D keypoints is valuable for preventing highly incorrect predictions. However, excessive reliance on 2D cues introduces bias (inaccurate camera)--->when the loss falls below this effective threshold, minimize its impact to prevent over-fitting to the camera/pose bias (Threshold-Adaptive Loss Scaling (TALS))

(Threshold-Adaptive Loss Scaling) relaxes the keypoint matching constraint --->Introduce a new problem: Predicting 3D pose from 2D keypoints is fundamentally ambiguous--->Seek an unbiased prior that restricts the network to only output valid poses but does not bias it to any particular pose (use Vector Quantized-VAE to learns an encoding of 3D pose in a discrete representation by pre-training on extensive motion capture datasets, since VQ-VAE[1]’s are designed to represent a uniform prior, this reduces the biases caused by previous pose priors). We noticed this similar method using VQ-VAE[2].

Overview

latent feature z can be computed as z = E(θ)

latent feature zi is quantized using the codebook CB by finding the most similar code element by:

Loss for training VQ-VAE:

Loss for training TokenHMR:

this paper link: https://arxiv.org/abs/2404.16752 [1] (VQ-VAE) Neural Discrete Representation Learning https://arxiv.org/abs/1711.00937 [2] T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations https://arxiv.org/abs/2301.06052 code link: https://github.com/saidwivedi/TokenHMR : the full code will be released by end of May.

ouusan / some-papers