hustvl / 4DGaussians

[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
https://guanjunwu.github.io/4dgs/
Other
2.14k stars 174 forks source link

The quality of multiple data sets is significantly lower than that in the project page #27

Closed ch1998 closed 11 months ago

ch1998 commented 11 months ago

I retrained on both the cook_spinach and chickchicken data. The picture below is the result of my training. The quality is obviously not as shown in the project page. Are there any parameters that need to be adjusted?

00000 00010

ch1998 commented 11 months ago

Use imgs2poses.py script from the LLFF code to re-register, and the final rendering is a mess. Which step did I make a mistake in? The initial point cloud uses a random point cloud.

guanjunwu commented 11 months ago

what?? I don't think it will happen... Maybe you can set larger batch size and use sfm pointclouds to inspire training?

ch1998 commented 11 months ago

what?? I don't think it will happen... Maybe you can set larger batch size and use sfm pointclouds to inspire training?

Thanks Reply. cook_spinach may be due to the fact that my sampling frame rate is too long. After I increased the sampling frame rate, the effect was significantly improved. Regarding the chickchicken, I didn't make any changes, but the quality is obviously not as good as what is shown in your project page.

Advocate99 commented 11 months ago

what?? I don't think it will happen... Maybe you can set larger batch size and use sfm pointclouds to inspire training?

Thanks Reply. cook_spinach may be due to the fact that my sampling frame rate is too long. After I increased the sampling frame rate, the effect was significantly improved. Regarding the chickchicken, I didn't make any changes, but the quality is obviously not as good as what is shown in your project page.

I find similar results in my experiments. I think maybe this is due to the change of the frame numebr. As you increase the sampling frame rate, there are more frames in the videos and the timesteps would be denser. If that happens, maybe it is quite difficult for the model to learn the deformation. I noticed that if there are more than 150 or 200 frames, the performances will drop quickly. It is also difficult for NeRF to learn long dynamic scenes.

MasterBin-IIAU commented 10 months ago

@ch1998 @Advocate99 @guanjunwu Dear colleagues, I am a beginner for 3D Gaussain Splatting and dynamic novel view sythesis. May I ask the standard practice of using DyNerfdataset? According to the data preparation, extracting frames from the videos can save memory. Besides, I find that the original videos in DyNerf datasets are 2704x2028 and 30FPS. If I extracting frames at 30FPS, the png frames will occupy much larger space than the original video (for example, 1200MB vs 50MB). So should I extract frames at a lower frame rate? Which frame rate do you suggest? Is there a common standard?

In addition, as shown in Table 3 of the paper <4D Gaussian Splatting for Real-Time Dynamic Scene Rendering>, rendering resolution is set to 1386×1014. So when extracting frames, should I also downsample the frames by 2x? This may also help to reduce the space.

Looking forward to your reply. Thanks in advance!

ch1998 commented 10 months ago

Perform frame extraction processing on the video data of dynerf, and use the first stitch as the frame of colmap. Of course, you can also use the camera data provided by dynerf. For the colmap script, you can use the py script provided by LLFF. I recommend extracting frames at 30FPS. Lowering FPS will lead to worse results. I tried to extract only 20 frames from each video for generation, but the dynamics of the characters were not good. Forgive me for not paying attention to the space issue. And no need for downsampling.