ouusan / some-papers

0 stars 0 forks source link

Transformer-based HMR #4

Open ouusan opened 3 months ago

ouusan commented 3 months ago
  1. Mesh Graphormer Transformer-based approaches are effective in modeling non-local interactions among 3D mesh vertices and body joints, whereas GCNNs are good at exploiting neighborhood vertex interactions based on a prespecified mesh topology. In this paper, we study how to combine graph convolutions and self-attentions in a transformer to model both local and global interactions. (investigate the integration of graph convolutions and self-attentions within transformers to effectively capture both local and global interactions among 3D mesh vertices and body joints for enhanced modeling in human pose estimation tasks/mix graph convolutions and self-attentions in transformers to better understand connections between adjacent and distant points in human pose estimation.)

  2. THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers Regress an intermediate 3d representation in the form of surface landmarks (markers) and regularize it in training using a statistical body model, Preserve the spatial structure of high-level image features by avoiding pooling operations, and relying instead on self-attention to enrich the representation.

  3. One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer: resolution issues

    performs a feature-level upsample-crop scheme to extract high-resolution part-specific features and adopt keypoint-guided deformable attention to estimate hand and face precisely.(Referred from Defromable Detr)

    propose a differentiable feature-level upsampling-crop strategy to enhance the hands and face regression process as inspired by the recent ViTDet: reshape the feature tokens Tf into a feature map and upsample it into multiple higher-resolution features via deconvolution layers.

    Leverage 2D keypoint positions as prior knowledge to obtain better component tokens Tc than random initialization.

  4. MotionBERT Design Dual-stream Spatio-temporal Transformer (DSTformer) as the motion encoder to capture the long-range relationship among skeleton keypoints, In which spatial and temporal MHSA that captures the intra-frame and inter-frame body joint interactions respectively