wenbin-lin / OcclusionFusion

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D
https://wenbin-lin.github.io/OcclusionFusion/
256 stars 41 forks source link

Complete node graph creation in DeepDeform and Live Demo #1

Closed shubhMaheshwari closed 2 years ago

shubhMaheshwari commented 2 years ago

Thanks for sharing this amazing work! I had a doubt regarding the complete node graph used as input for Occlusion-aware Motion Estimation Network module.

Unlike DeformingThings4D where the complete object surface is known. In datasets like Deepdeform or for live demo, only front-view is available. In these cases what is the input to the module?

  1. Is complete object surface precomputed (maybe by DynamicFusion)?
  2. Or only the graph extracted from front-view RGBD Image at frame t_0 is used. All the confidence and visibility scores are computed on this graph and no graph update is made during the motion estimation step?
wenbin-lin commented 2 years ago

The complete object surface is incrementally fused over time. When a new surface appears, the deformation graph will be extended, and new nodes will be inserted, just like DynamicFusion.

shubhMaheshwari commented 2 years ago

I see. Does that mean the hidden state for each node is initialized and updated separately?

wenbin-lin commented 2 years ago

Yes, we maintain the hidden state of each node separately.

shubhMaheshwari commented 2 years ago

I see. Hence each node can be present for a variable amount of timesteps. Great Idea! Does this also mean that the complete node graph is skinned to the next RGBD frame after each update? Like in NeuralTracking how pixel anchors and weights are computed for each keyframe. In your work it is performed at each timestep.

wenbin-lin commented 2 years ago

The transformation parameters of the whole deformation graph will be updated at each frame by optimization. However, I'm not sure what the pixel anchors and weights stand for, can you describe them in detail?

shubhMaheshwari commented 2 years ago

In NeuralTracking correspondence is established between the source image and node graph(similar to Embedded Deformation but on each pixel instead of point cloud). In their paper pixel anchors were the closest nodes to each pixel and pixel weights were the skinning weights. Thus after predicting optical flow they use this information to estimate transformation parameters.

Is a similar setup used in your paper? Otherwise, after running optical flow, how do you estimate the visibility or motion of each graph node? For each timestep, no correspondence information is present between the new source RGBD image(I_s) and the complete node graph.

Or is the deformed node graph projected to the pixel space of I_s. Then based on depth value visibility is calculated and using optical flow motion is estimated?

wenbin-lin commented 2 years ago

As we describe in the implementation details of our paper. We project the 3D graph nodes to 2D image space to estimate the visibility based on the depth values, and then based on the depth values and the optical flow, we compute the 3D motion of the visible nodes by backproject the 2D coordinates to 3D space.

shubhMaheshwari commented 2 years ago

Thank you for clearing the confusion. Closing this issue since most questions were already answered in the paper.