zhengqili / Neural-Scene-Flow-Fields

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes"
MIT License
716 stars 94 forks source link

replicating the CVD comaprison #16

Open yaseryacoob opened 3 years ago

yaseryacoob commented 3 years ago

I would like to replicate the CVD experiment you have on the project page.

  1. Since CVD estimates depth, I wonder how you did the texturing
  2. Is it possible to plug different depth maps into your software? For example, I have a depth map for each image (like RGBD)? thanks
zhengqili commented 3 years ago

Hi,

in terms of using a depth map to generate novel views, I used the idea from https://arxiv.org/pdf/2004.01294.pdf to warp contents of static regions from neighboring frames and warp contents of the dynamic region from reference view into a novel view through point cloud splatting, with the code from https://github.com/sniklaus/3d-ken-burns.

Another way to achieve it, which usually provides me with a better rendering result, is to create a textured mesh from point clouds and input pixels, followed by rasterization to render novel views. I used the implementation from https://github.com/vt-vl-lab/3d-photo-inpainting for a baseline of 3D photos. This approach usually produces a much better result (especially for disocculusion) from RGBD images.

yaseryacoob commented 3 years ago

Thanks for the details, I will look into them. Let me clarify

  1. I am trying to test this hypothesis within your framework: How the depth estimate affects the highest quality rendering. When you compared to CVD, I assume you switched their depth maps into yours (but you seem to also not do inpainting, if I interpret the videos correctly).
  2. Maybe your comparison is different to CVD, but I can't tell from the online page. If you are willing to share the code that took in the CVD depth and generated the video it will save me guessing how the comparison was done.
zhengqili commented 3 years ago

Hi,

In comparison to CVD, I am not using CVD depth in our framework, but using CVD depth to perform traditional depth-based image-based rendering based on the method described in Sec 3.2 of https://arxiv.org/pdf/2004.01294.pdf . (Unfortunately, I kept this implementation at Adobe Research private repo, but I am no longer at Adobe. )

I don't think CVD would work for a general dynamic scene since it minimizes epipolar consistency within the entire scene without taking into consideration the object motion or ignoring moving objects, and this would cause incorrect depth results for moving objects, from my previous experiments. Thus, I think single-view depth is still the best initialization strategy in our case.