Closed TnoobT closed 1 year ago
你好。我们的方法目前不适合预测单个视频,最后的结果可能会抖动:这是因为SAHMR是一个单帧的方法,没有对连续帧做额外的设计。需要场景点云。
好的,感谢大佬回复
@zehongs if useit to run with video, we need do some ?can you give me some advices ?
Hi luoww1992, you'll need scene scans, camera poses to to run the model.
Hi luoww1992, you'll need scene scans, camera poses to to run the model.
yes, i want to use a new 3d scene to run it with a new scene created by depth image, and i notice : https://github.com/zju3dv/SA-HMR/issues/1#issuecomment-1586475791
so what can we do something for a video test to reduce some wrong ?
@zehongs q1: what the dump_results.py and eval_results.py are do ? i see that dump file is to get smpl poses in 3d scene , the eval file is to get loss with GT 。is it right ? q2: you say you cost 170ms with a single image, i have got 3fps/s in 2080ti + win10 +cuda113+torch1.12, it is too slow, some ways get more fps ?
@zehongs i see the pdf content: SA-HMR runs at 170 ms with a peak memory cost of 1852 MB for a 224×224 image and a scene point cloud of 2 cm resolution on a V100 GPU. Specifically, the root and contact module takes 92 ms (CNN 50 ms, SPVCNN 42 ms), the mesh recovery module takes 75 ms (CNN 49 ms, Transformer 26 ms), and the intermediate processing takes 3 m.
the root and contact module : CNN 50 ms, SPVCNN 42 ms, the mesh recovery module: CNN 49 ms, Transformer 26 ms, This is performed sequentially, with the output of the previous step being the input of the next. Or can we perform these 4 steps at the same time ?
@zehongs in dump_result.py, the result args: no.1, pred_c_verts -- human mesh in 3D scene ? or it is just human mesh without displacement no.2, pred_c_pelvis -- human pose or human location in 3D scene ? if location, where is different between it and the position got by human mesh in no.1 no.3, pred_c_pelvis_refined -- it is fit for no.2 ?, the same question: human pose or human location in 3D scene ? if location, where is different between it and the position got by human mesh in no.1.
Hi, sorry for the late reply. The four parts you mentioned are performed sequentially.
And _c_
means camera coordinates. For example, the pred_c_verts
means the prediced human vertices in the camera coordinates. You should be able to transform it to the world coordinate, since we assume the camera pose is known in this paper.
The pred_c_pelvis
and pred_c_pelvis_refined
are actually the intermediate results of the Root&Contact module. I save these results for evaluation only. If you want to get the position of the human mesh in no.1, you should still need to compute the joint from the pred_c_verts
.
like this: https://github.com/dluvizon/scene-aware-3d-multi-human
@zehongs about result batch args:
what are the args main meaning?
大佬您好,我想请问一下,您设计的模型是不是 不能预测单个视频?还要对应的场景点云一起输入才行。