zju3dv / GVHMR

Code for "GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates", Siggraph Asia 2024
https://zju3dv.github.io/gvhmr
Other
464 stars 29 forks source link

Fail to put multiple persons in the same world frame #29

Open hongsukchoi opened 1 month ago

hongsukchoi commented 1 month ago

Hi @zehongs @pengsida ,

Thank you for your continuous great works! I have a question about GVHMR.

How can I visualize the global camera trajectories (sequential 6D camera poses), when there are mutiple persons?

If I am right, GVHMR's outputs (camera pose estimation and human's global trajectories) have no explicit relation with SLAM camera pose predictions. Also, GVHMR estimates its own camera poses for the cropped image per single person. I think the gravity coordinate frame is also defined per person. So I had to stitch the camera trajectories with some heuristics to deal with multiple people appearing and disappearing in the video.

As a result, I get this kind of weird result from a PoseTrack's multi-person video. I want to know whether I am doing something wrong, or there's a better way of getting camera trajectory visualization.

Screenshot 2024-10-09 at 5 02 40 PM Screenshot 2024-10-09 at 5 02 47 PM

Caution: These images are not exactly time synchronized. Full video: https://youtu.be/qwUeOn0HieI https://youtu.be/vN45gpskOuI

I used Viser for 3D visualization, since everything always look good in 2D rendering. Here is my code: https://github.com/hongsukchoi/GVHMR_vis/blob/hongsuk/mp_gloval_viser_vis.py

Again, thank you for your great work!!

zehongs commented 1 month ago

Hi @hongsukchoi, sorry for the late reply. As you mentioned, GVHMR doesn't explicitly model the camera transformation from world to camera, which is also the case for WHAM. To solve this problem, an extra global optimization of human motion and camera motion is required. By the way, I think TRAM (ECCV24) might be suitable for your project.

hongsukchoi commented 1 month ago

Thanks for the reply!

I found this paper too. If they are fast and easy to use, this would be great. https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Synergistic_Global-space_Camera_and_Human_Reconstruction_from_Videos_CVPR_2024_paper.pdf

I have one more request... Could you clarify your method about the gravity coordinate?

I think the gravity coordinate frame is also defined per person.

zehongs commented 1 month ago

Yes, that's correct.