Open hongsukchoi opened 1 month ago
Hi @hongsukchoi, sorry for the late reply. As you mentioned, GVHMR doesn't explicitly model the camera transformation from world to camera, which is also the case for WHAM. To solve this problem, an extra global optimization of human motion and camera motion is required. By the way, I think TRAM (ECCV24) might be suitable for your project.
Thanks for the reply!
I found this paper too. If they are fast and easy to use, this would be great. https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_Synergistic_Global-space_Camera_and_Human_Reconstruction_from_Videos_CVPR_2024_paper.pdf
I have one more request... Could you clarify your method about the gravity coordinate?
I think the gravity coordinate frame is also defined per person.
Yes, that's correct.
Hi @zehongs @pengsida ,
Thank you for your continuous great works! I have a question about GVHMR.
How can I visualize the global camera trajectories (sequential 6D camera poses), when there are mutiple persons?
If I am right, GVHMR's outputs (camera pose estimation and human's global trajectories) have no explicit relation with SLAM camera pose predictions. Also, GVHMR estimates its own camera poses for the cropped image per single person. I think the gravity coordinate frame is also defined per person. So I had to stitch the camera trajectories with some heuristics to deal with multiple people appearing and disappearing in the video.
As a result, I get this kind of weird result from a PoseTrack's multi-person video. I want to know whether I am doing something wrong, or there's a better way of getting camera trajectory visualization.
Caution: These images are not exactly time synchronized. Full video: https://youtu.be/qwUeOn0HieI https://youtu.be/vN45gpskOuI
I used Viser for 3D visualization, since everything always look good in 2D rendering. Here is my code: https://github.com/hongsukchoi/GVHMR_vis/blob/hongsuk/mp_gloval_viser_vis.py
Again, thank you for your great work!!