关于输入 - Githubissues

zju3dv / SA-HMR

Code for "Learning Human Mesh Recovery in 3D Scenes" CVPR 2023

MIT License

83 stars 0 forks source link

关于输入 #1

Closed TnoobT closed 1 year ago

TnoobT commented 1 year ago

大佬您好，我想请问一下，您设计的模型是不是不能预测单个视频？还要对应的场景点云一起输入才行。

zehongs commented 1 year ago

你好。我们的方法目前不适合预测单个视频，最后的结果可能会抖动：这是因为SAHMR是一个单帧的方法，没有对连续帧做额外的设计。需要场景点云。

TnoobT commented 1 year ago

好的，感谢大佬回复

luoww1992 commented 1 year ago

@zehongs if useit to run with video, we need do some ？can you give me some advices ?

zehongs commented 1 year ago

Hi luoww1992, you'll need scene scans, camera poses to to run the model.

luoww1992 commented 1 year ago

Hi luoww1992, you'll need scene scans, camera poses to to run the model.

yes, i want to use a new 3d scene to run it with a new scene created by depth image, and i notice : https://github.com/zju3dv/SA-HMR/issues/1#issuecomment-1586475791

so what can we do something for a video test to reduce some wrong ?

luoww1992 commented 1 year ago

@zehongs q1: what the dump_results.py and eval_results.py are do ? i see that dump file is to get smpl poses in 3d scene , the eval file is to get loss with GT 。is it right ? q2: you say you cost 170ms with a single image， i have got 3fps/s in 2080ti + win10 +cuda113+torch1.12, it is too slow, some ways get more fps ?

luoww1992 commented 1 year ago

@zehongs i see the pdf content: SA-HMR runs at 170 ms with a peak memory cost of 1852 MB for a 224×224 image and a scene point cloud of 2 cm resolution on a V100 GPU. Specifically, the root and contact module takes 92 ms (CNN 50 ms, SPVCNN 42 ms), the mesh recovery module takes 75 ms (CNN 49 ms, Transformer 26 ms), and the intermediate processing takes 3 m.

the root and contact module : CNN 50 ms, SPVCNN 42 ms, the mesh recovery module: CNN 49 ms, Transformer 26 ms, This is performed sequentially, with the output of the previous step being the input of the next. Or can we perform these 4 steps at the same time ？

luoww1992 commented 1 year ago

@zehongs in dump_result.py, the result args: no.1, pred_c_verts -- human mesh in 3D scene ? or it is just human mesh without displacement no.2, pred_c_pelvis -- human pose or human location in 3D scene ? if location, where is different between it and the position got by human mesh in no.1 no.3, pred_c_pelvis_refined -- it is fit for no.2 ?, the same question: human pose or human location in 3D scene ? if location, where is different between it and the position got by human mesh in no.1.

zehongs commented 1 year ago

Hi, sorry for the late reply. The four parts you mentioned are performed sequentially.

And _c_ means camera coordinates. For example, the pred_c_verts means the prediced human vertices in the camera coordinates. You should be able to transform it to the world coordinate, since we assume the camera pose is known in this paper. The pred_c_pelvis and pred_c_pelvis_refined are actually the intermediate results of the Root&Contact module. I save these results for evaluation only. If you want to get the position of the human mesh in no.1, you should still need to compute the joint from the pred_c_verts.

luoww1992 commented 1 year ago

@zehongs i have know you said, if i want to get the verts or keypoints from 3D point cloud scene, how to do ? because we get the contact and pose in 3D point cloud scene

like this: https://github.com/dluvizon/scene-aware-3d-multi-human

luoww1992 commented 1 year ago

@zehongs about result batch args: pred

what are the args main meaning?