zhangboshen / A2J

Code for paper "A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation from a Single Depth Image". ICCV2019
MIT License
289 stars 46 forks source link

the meaning of the shape of tensor #46

Closed wangzheallen closed 3 years ago

wangzheallen commented 3 years ago

Hi @zhangboshen, thanks for the good work!

May I ask what is the meaning of the function 'shift' https://github.com/zhangboshen/A2J/blob/master/src_train/anchor.py#L26?

What is the meaning of N, A and P in N(whA)P in https://github.com/zhangboshen/A2J/blob/master/src_train/anchor.py#L67

why the softmax is on classfication head output instead of anchors? (https://github.com/zhangboshen/A2J/blob/master/src_train/anchor.py#L121)

What is the meaning and use of the function 'post-process'? https://github.com/zhangboshen/A2J/blob/master/src_train/anchor.py#L44

Thanks!

zhangboshen commented 3 years ago

@wangzheallen Hi, In 'anchor.py', 1: function 'shift' is uesd for generating all anchor point coordinates, see line 48; 2: N= batch size, A = number of anchor points, P = number of keypoints; 3: classfication head predicts weights for each anchor points, it's part of our model design (anchor proposal branch); 4: just like the name itself, funcion 'post_process' pro-process the model predictions into standard keypoints coordinates. Hope it helps.

wangzheallen commented 3 years ago

@zhangboshen Thanks for the clarification!

For 1: At https://github.com/zhangboshen/A2J/blob/master/src_train/anchor.py#L45, is the P_h and P_w index for output grid (1111) to index input grid (176176)? is the shape the output grid shape (11*11)? and stride the step_size for 1 pixel in output grid corresponding to input grid?

For 2: Why the softmax is on the N (batch_size) and is not on the A (num of anchor points)? as at https://github.com/zhangboshen/A2J/blob/master/src_train/anchor.py#L121

Do you specify the number of anchor points to be selected? as in Fig 1 or Fig 6. or you treat all the grid point on depth map as anchor point and just select based on the threshold 0.02? Thanks!

zhangboshen commented 3 years ago

@wangzheallen ,

1) yes, you are right. 2) the softmax is not on the N dimension, but all the anchors dimension, cause we already remove N bu line 116; 3) actually, we specify the input resolution (176x176) and anchor stride (4 pixels), and once the resolution and stride are specified, the number of anchor points are fixed; and all of the anchor points are used for weight assignment and offset prediction, the threshold 0.02 is just for visualization.

wangzheallen commented 3 years ago

Thanks for the clarification :+1: