gwleeee / PaperReview

0 stars 0 forks source link

CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation #3

Open gwleeee opened 1 year ago

gwleeee commented 1 year ago

CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation

paper: https://arxiv.org/abs/2203.13387 code: not available CVPR2023

Abstract

Introduction

3. Method

image image

Notation

3.1. Transformer

step 1. Patch embedding

image

step 2. Self-attention

image image

step.3

3.2. Spatial Interaction

Cross-Joints Interaction (CJI) Module

image

그러면 이때, J의 순서가 꽤나 중요하지 않을까 생각이 듦

3.3 Temporal Interaction

image

Cross-Frames Interaction (CFI) Module

Z에 대한 shape가 잘못 표현된거 같음 $\mathbb{R}^{F\times{(J\times D)}}$가 맞을듯

3.4 Regression Head

$\mathbb{R}^{F\times{(J\times D)}}$ -> Linear -> $\mathbb{R}^{1\times{(J\times 3)}}$

마지막에 F 차원을 살리는 방향으로 연산해서 시퀀스 단위 추론도 가능할듯

3.5 Loss Function

4. Experiments

image image image