Question about Table 1 and experiments

hongsukchoi / 3DCrowdNet_RELEASE

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

MIT License

155 stars 15 forks source link

The first two questions are answered in the paper. TL;DR, It is tested on the 3DPW-Crowd and trained on a mixed dataset.

don't quite understand this operation. The img_feat is a 2d image-space feature. For sampling on it, the 3d joint may use a perspective projection to get the 2d image space point. Why just use the x,y of the 3d joint?

First, I tried your suggestion but did not bring any gain empirically. Second, the z estimation is itself highly ambiguous, especially in the occluded scenarios, so I think it is better to use only x,y values.

When you are testing your idea of crowded-scene robust, do you test on the 3DPW-crowd instead of the large whole 3dpw test set? If the results are good on 3dpw-crowd, then you test on the whole 3dpw test set. Do the research procedure I describe correct? No. During developing 3DCrowdNet, I mainly tested on 3DPW-Crowd. Results on other datasets are tested after the complete development of 3DCrowdNet.

hongsukchoi / 3DCrowdNet_RELEASE

Question about Table 1 and experiments #23