Closed shubhMaheshwari closed 2 years ago
The complete object surface is incrementally fused over time. When a new surface appears, the deformation graph will be extended, and new nodes will be inserted, just like DynamicFusion.
I see. Does that mean the hidden state for each node is initialized and updated separately?
Yes, we maintain the hidden state of each node separately.
I see. Hence each node can be present for a variable amount of timesteps. Great Idea! Does this also mean that the complete node graph is skinned to the next RGBD frame after each update? Like in NeuralTracking how pixel anchors and weights are computed for each keyframe. In your work it is performed at each timestep.
The transformation parameters of the whole deformation graph will be updated at each frame by optimization. However, I'm not sure what the pixel anchors and weights stand for, can you describe them in detail?
In NeuralTracking correspondence is established between the source image and node graph(similar to Embedded Deformation but on each pixel instead of point cloud). In their paper pixel anchors were the closest nodes to each pixel and pixel weights were the skinning weights. Thus after predicting optical flow they use this information to estimate transformation parameters.
Is a similar setup used in your paper? Otherwise, after running optical flow, how do you estimate the visibility or motion of each graph node? For each timestep, no correspondence information is present between the new source RGBD image(I_s) and the complete node graph.
Or is the deformed node graph projected to the pixel space of I_s. Then based on depth value visibility is calculated and using optical flow motion is estimated?
As we describe in the implementation details of our paper. We project the 3D graph nodes to 2D image space to estimate the visibility based on the depth values, and then based on the depth values and the optical flow, we compute the 3D motion of the visible nodes by backproject the 2D coordinates to 3D space.
Thank you for clearing the confusion. Closing this issue since most questions were already answered in the paper.
Thanks for sharing this amazing work! I had a doubt regarding the complete node graph used as input for Occlusion-aware Motion Estimation Network module.
Unlike DeformingThings4D where the complete object surface is known. In datasets like Deepdeform or for live demo, only front-view is available. In these cases what is the input to the module?