train for a static scene

yihua7 / SC-GS

[CVPR 2024] Code for SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

https://yihua7.github.io/SC-GS-web/

MIT License

427 stars 21 forks source link

train for a static scene #22

Closed HyoKong closed 3 months ago

HyoKong commented 3 months ago

Hi, thanks for the great work!

After I train a static scene, there are so many floating points near the camera. Screenshot from 2024-03-25 15-01-21

The PSNR of the training and testing views, however, are very high, 32 and 31, respectively.

Here is my training script: CUDA_VISIBLE_DEVICES=0 python train_gui.py --source_path data/in2n-data/face/ --model_path outputs/face --deform_type node --node_num 512 --hyper_dim 8 --eval --gt_alpha_mask_as_scene_mask --local_frame --resolution 2 --W 800 --H 800 --is_scene_static --gui

Could you please help to figure it out? Is there any issue with my implementation?

Thank you so much in advance!

yihua7 commented 3 months ago

Hello,

The current setup of the scene is face-forward (oriented directly toward the viewer), which might pose a significant challenge for Gaussian Splatting to generate an accurate representation of a deviated perspective as depicted in your image. I suggest considering an alternative approach to check this by replacing the parameter --deform_type node with --deform_type static. This adjustment would allow for the scene to be reconstructed using vanilla static Gaussian Splatting. By doing so, you can assess whether there are any issues within the deform code that could be causing unexpected results.

I suppose such face-forward scenes, lacking 360 degree photometric infos, are too challenging for vanilla Gaussian Splatting.

yihua7 commented 3 months ago

Indeed, Gaussian Splatting remains effective in face-forward scenes during camera interpolation. It's why the PSNRs at the test view are still high. However, its performance is compromised when utilized for extrapolation view synthesis, as shown in your image.

HyoKong commented 3 months ago

Hi, yihua, thanks for the explanation. After I change the deform_type to static, there are no floating points anymore. I guess the reason for floating points is the sampling strategy for the node. When training till opt.iterations_node_sampling iteration, the 512 nodes are re-sampled from the GS by the farthest sampling strategy. Maybe some 'noise nodes' near the camera are sampled by the farthest sampling strategy and more floating GS are densified near the 'noise nodes'.

I'm not sure if the suspection is correct. Pls feel free to correct me if there is something wrong or not convincing.

Thank you!

yihua7 commented 3 months ago

Hi, thank you for your feedback! I recommend using the provided script in this repository to mask the background and keep the object you wish to edit (e.g., the human in your scene). By masking multi-view images with a transparent background, you can potentially eliminate any unwanted floaters and achieve more coherent editing specifically on the human target. During training you can use --gt_alpha_mask_as_dynamic_mask --gs_with_motion_mask to decompose the static and dynamic (not necessary dynamic, editable) parts and rendering them both together. This is the way I made the DEMOs of T&T scenes Family and Horse. Hope this information helps! : )

HyoKong commented 3 months ago

I'll have a try. Thank you so much for your advice and guidance!