Exploiting Temporal Information

ouusan commented 1 month ago

1.(HMMR) Learning 3d human dynamics from video(2019) temporal encoder: 1D temporal convolutional layers, precompute the image features on each frame, get current and ±∆t frames prediction. code: https://github.com/akanazawa/human_dynamics/tree/master (tensorfolw) 2.VIBE: Video inference for human body pose and shape estimation(2020) GRU-based attention to amplify the contribution of the most important frames, motion discrimination, train a Motion prior(MPoser)(No MPoser related code??) code: https://github.com/mkocabas/VIBE 3.(TCMR) Beyond static features for temporally consistent 3d human pose and shape from a video(2021) (based on 2.) bi-directional GRU(1th<-->T th)+ two uni-directional GRUs(1th->2/T -1 th, T th->2/T +1 th), and final hidden state become temporal feature past/future, occlusion augmentation, precompute static features to save training time and memory(like 1.) code: 1https://github.com/hongsukchoi/TCMR_RELEASE 4.Encoder-decoder with multilevel attention for 3d human shape and pose estimation(2021) Spatial-Temporal Encoder (STE) (different order of input dimensions affects the meaning of learned attention) +Kinematic Topology Decoder (KTD)( implicitly model the attention at the joint level, generate the pose parameter for each joint in hierarchical ) code: https://github.com/ziniuwan/maed 5.(TePose)Live stream temporally embedded 3d human body pose and shape estimation(2022) bi-GRU+uni-GRU, GCN based motion discriminator, For each GCN block, graph features are processed by multi-scale graph convolutional network (MSGCN) [7] and multi-scale graph 3D convolutional network (MS-G3D) [33] Sequential data loading strategy code: https://github.com/ostadabbas/TePose. 6.(GLoT)Global-to-local modeling for video-based 3d human pose and shape estimation(2023) Global Motion Modeling: Global Transformer->Masked Pose and Shape Estimation-->Human Prior Padding-->Iterative Regressor-->Global Mesh Sequence Local Parameter Correction: Local Transformer for nearby frames-->Hierarchical Spatial Correlation Regressor(same to KTD in 4.)-->Mid-frame Prediction code: https://github.com/sxl142/GLoT

ouusan commented 1 month ago

1.temporal conv 1D(inter-frame relations): https://github.com/akanazawa/human_dynamics/blob/master/src/models.py#L209 precompute the image features on each frame similar to https://arxiv.org/pdf/1806.06053(??) 2.related works: GANs for sequence modeling RNN-based attentive attention https://arxiv.org/pdf/1409.0473 Neural Machine Translation by Jointly Learning to Align and Translate MPoser: キャプチャ 3.follow 3-32, https://arxiv.org/pdf/1808.09316 How Robust is 3D Human Pose Estimation to Occlusion? (data augmentation with synthetic occlusions during training to improve robustness) TemporalEncoder: https://github.com/hongsukchoi/TCMR_RELEASE/blob/master/lib/models/tcmr.py#L82-L105

Horizontal flipping, random cropping, random erasing and color jittering are employed to augment the training samples.
5-7 Spatio-temporal graph convolution for skeleton based action recognition https://arxiv.org/abs/1802.09834 5-33 Disentangling and unifying graph convolutions for skeleton-based action recognition https://arxiv.org/pdf/2003.14111 and code:https://github.com/kenziyuliu/ms-g3d
6-13 iterative regressor in HMR: https://github.com/MandyMo/pytorch_HMR/blob/master/src/model.py#L37-L45

Attention visualization for Masked Pose and Shape Estimation strategy : キャプチャ

ouusan commented 1 month ago

Motion Discriminator: GRU+attentive attention: https://github.com/mkocabas/VIBE/blob/master/lib/models/motion_discriminator.py#L65-L77 this attention: https://github.com/mkocabas/VIBE/blob/master/lib/models/attention.py#L65-L77

3.multi (3) frame prediciton: reshape operation https://github.com/hongsukchoi/TCMR_RELEASE/blob/master/lib/models/tcmr.py#L159

4.KTD: https://github.com/ziniuwan/maed/blob/master/lib/models/ktd.py J_regressor is a matrix used to map the 3D mesh vertices of the body to the corresponding 3D joint positions.

GCN-based Discriminator multi scale 3d gcn (do not understand the temporal window???) https://github.com/ostadabbas/TePose/blob/master/lib/models/motion_discriminator_gcn.py#L60 ms-gcn: https://github.com/ostadabbas/TePose/blob/master/lib/models/ms_gcn.py#L44

ms 3d gcn: Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition (ms-tcn, ms-gcn,ms-gtcn,ms-g3d) https://arxiv.org/pdf/2003.14111 and code: https://github.com/kenziyuliu/ms-g3d core code: build_spatial_temporal_graph https://github.com/kenziyuliu/MS-G3D/blob/master/model/ms_gtcn.py#L92

ouusan commented 1 month ago

KTD

ouusan commented 1 month ago

related GCNs and TCNs: Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos(2021) https://arxiv.org/pdf/2012.11806 human-joint GCN(heatmap-based adjacent matrix A)+human-bone GCN(part affinity field-based adjacent matrix B) joint-TCN, velocity-TCN, root-TCN code:(no training code) https://github.com/3dpose/GnTCN

ouusan / some-papers

Exploiting Temporal Information #23