DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

Motivations
1. Recent studies has been verified that the pretrained diffusion model can provide structure-aware knowledge for 3D generation and perception tasks, while challenge still persists in effectively leveraging these capabilities for complex regression tasks like occluded human mesh recovery.
2. In occluded human mesh recovery task, mainstain work alway apply off-the-shelf 2d key-point detectors to achieve coarse human joints as hints, while disturbances to the 2D detector due to noise and occlusion significantly impact accuracy.---> enhance the estimation robustness against noisy 2D key-point
Key ideas

challenge persists in occluede HMR task--->current two main methods(1) feature extractor+regression head [1] (2) Diffusion-based methods that multiple denoising steps to progressively refine the pose parameters from random noise or off-the-shelf regression SMPL parameter (e.g., [2] use 4d human regressed SMPL parameters as initial and continue to improve it) -----> DPMesh directly employs the pre-trained denoising U-Net with conditons as backbone, executing a one-step inference.
disturbances arise from 2d detectors---->refine the spatial information from an off-the-shelf detector and inject the diffusion model with these conditions as guidance+ noisy key-point reasoning approach to improve the robustness of their model.

1.Employ pre-trained denoising U-Net with conditons(Add Conditional Control [3] to realize condition Injection):

(1) get fine-grained conditons, do condition jnjection with heatmap: C=(ct, cj)

(2) feature extraction with diffusion model (fuse conditons with controlNet)

(3) SMPL Mesh Regressor

Use learned codebook and feed the corresponding pose embedding to the decoder D of the VQVAE(Vector Quantized) to attain the pose parameters Θ

a self-supervised distillation approach, called Noisy Key-point Reason (NKR), focuses on 2D detection errors, including missing key-points, jitters and mismatch. Training a teacher model adept at accurately encoding feature maps with precise ground truth key-points. Then utilize the teacher’s feature maps FT to guide and supervise the student’s feature map FS. see Overview.
Overview

ouusan / some-papers