hongsukchoi / Pose2Mesh_RELEASE

Official Pytorch implementation of "Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose", ECCV 2020
MIT License
678 stars 69 forks source link

Multiple 3D pose from different angles as input and paint the texture to the generated model #67

Open felixshing opened 1 year ago

felixshing commented 1 year ago

Hello, I am currently working on building an live interactive volumetric video streaming system. The setup is that, on both sides, we have multiple depth cameras to capture user's movements, each camera is attached with a Jetson edge device to process the data. Each user will wear HoloLens 2. What we want to do is to stream the other user's movements to this user's headset, on both sides. Thus, it can be an live interactive volumetric video system.

An example of volumetric video can be found here: https://www.youtube.com/watch?v=i4a6fxqP1nM, whose paper is https://arxiv.org/pdf/2007.13988.pdf

In order to reduce the size of data transmission, we now have a idea that for the first frame, we transmit the whole 3D model. While for the rest frame, in the sender side, we only transmit the skeleton information. On the receiver side, we use the updated skeleton information, plus the whole 3D model transmitted in the previous frame, to reconstruct the 3D model in the current frame. It is okay that if we need to transmit the whole 3D model from time to time. But the most of frames should only transmit the skeleton information.

I have done extensively literature survey and find that your work is the most relevant one that we can leverage. I now have some questions that would like to discuss with you.

1.In our setup, we use multiple depth cameras to capture movements from different angles. Thus, our input can be multiple 3D pose, instead of single one. How to feed these multiple poses into Pose2Mesh at the same time?

  1. It seems that the generated 3D model from Pose2Mesh contains only the geometry, not the texture. But we also need the texture in order to have a photo-realistic model. Is there any way or idea to paint the texture on the generated model based on the whole 3D model transmitted in the previous frame?
  2. Do you know any related papers that similar with our idea, especially in the CV/CG community? I am working on system/networking community so I am not very familiar with the papers in CV/CG community. Given that you are expertise in this area, I am wondering maybe you know some other papers that we could learn from them.

Thank you for your time. Any suggestions and comment is truly appreciable.

hongsukchoi commented 1 year ago

Hi, Thank you for your interest.

1.In our setup, we use multiple depth cameras to capture movements from different angles. Thus, our input can be multiple 3D pose, instead of single one. How to feed these multiple poses into Pose2Mesh at the same time?

You can use bundle adjustment to find the 3D pose that has the least error compared with multiple 3D poses from different cameras. And then put that 3D pose to the Pose2Mesh's MeshNet.

  1. It seems that the generated 3D model from Pose2Mesh contains only the geometry, not the texture. But we also need the texture in order to have a photo-realistic model. Is there any way or idea to paint the texture on the generated model based on the whole 3D model transmitted in the previous frame?

To be honest, I don't think Pose2Mesh is appropriate for this task. Pose2Mesh cannot recover the texture and only has the geometry of the naked body, which might not be suitable if you want dressed human reconstruction.

My suggestion is to use personalized human-nerf like NeuralBody. Reconstruct the textured geometry of dressed humans with a NeRF representation from the first initial frame(s). You can animate (=change the pose) this NeRF representation with the given pose parameters (only 72 dimension). The pose parameters (=joint angles of 24 joints in SMPL) can be estimated from a 3D pose. You can either use a fitting approach (smplify-x) or a learning approach which will be much faster.

3 See above.

NeuralBody is just one of many Human-NeRF papers, and I think there will be more light and accurate methods now. I call Human-NeRF an approach that combines human mesh model (ex. SMPL) and NeRF representation.

felixshing commented 1 year ago

Thank you for your reply! I would definitely check NeuralBody and its related work later.

I understand that Pose2Mesh has only the geometry of the naked body. But given that we also have the whole 3D model, can I just paint the texture on the naked body based on it? If so, how could I do that?

hongsukchoi commented 1 year ago

You can project the produced 3D human mesh to images, rasterize the mesh surfaces, and obtain texture map from the rasterized surfaces.

Alternately get DensePose (projected and rasterized mesh surfaces) and obtain the texture map.

Then try this kind of approach. image

I think what you are trying to do is a very challenging problem in real-world, so my answers would be not enough but hope they help you...

felixshing commented 1 year ago

Thank you very much! I read some Human-NeRF papers and find that they may indeed what I want! I notice that they all use the SMPL skeleton, which contains only 24 joints. However, in order to make our reconstructed human models more photo-realistic, we may want to extract more key points. For example, we are currently using Zed 2i depth camera, and Zed SDK provides 70 key points model: https://www.stereolabs.com/docs/body-tracking/. Moreover, Google's Mediapipe provides a lot of keypoints on human's face.

If we would like to use our own joints, do you know how to apply it into SMPL model or NerualBody?