oppo-us-research / SpacetimeGaussians

[CVPR 2024] Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
https://oppo-us-research.github.io/SpacetimeGaussians-website/
Other
616 stars 45 forks source link

About Initialization #66

Open yzxQH opened 3 months ago

yzxQH commented 3 months ago

Great Work!Thanks for released code! But I still wonder why you use sparse point clouds from all available timestamps to initialize rather than only use timestamp=0 for initialization like most 4DGS related works do? And I have seen the dataset_readers, if I understood correctly, it seems that you just simply concatenate the sparse point clouds from all available timestamps, doesn't this make the initial point clouds overly redundant?

lizhan17 commented 3 months ago

1) our method inits from different timestamps. you can think it is like a video compression. video codec is better to have multiple frames than single image compression.

2) the initial point clouds is redundant if we only consider space from all timestamps, that is reason why we have temporal RBF(1d gaussian) to have a temporal opacity spacetime. we also have options to partially sample points from the input videos, like only use points from every N frames or use a neareast neibour in distance. (different group of picture in video codec)

yzxQH commented 2 months ago

Thanks a lot!I roughly understand now. But I have another question, can you explain in detail the meaning of the parameters trbfslinit , preprocesspoints , densify , desicnt in the techni_lite configs file? Under what circumstances would it be better to use which parameters? If I want to train on data exceeding 300 frames (not use every 50 frames training strategy), which parameter adjustments may be helpful?

lizhan17 commented 2 months ago

trbfslinit : control the shape of temporal RBF. preprocesspoints : reduce the initial points number (every N frames' points, or a spatial portion of points from each frame) densify: densification strategy. (main goal is adding points first then reducing points, differs at how to add and how to remove points) desicnt : number of densifications ( 6 or 12 times, most cases. i suggest 6)

for 300 frames, 1) the most important part should be use trbfslinit=4 with large value (so the initial temporal affecting range of each point will be small. 2) You can also use every 2 frames or every 4 frames points by setting the preprocesspoints=14 or 15 to avoid too many duplicating static points across time. (every 2 frames or every 4 frames)

elif self.preprocesspoints == 14:
        pcd = interpolate_partuse(pcd, 2) 
elif self.preprocesspoints == 15:
        pcd = interpolate_partuse(pcd, 4) 

3) I suggest that six 50-frames short squences should acchieve best results for 300 frames. we didn't optimize the training pipeline for 300 frames. (as we optimize all the points together during training.) duplicate points will cause artifacts across time.

yzxQH commented 2 months ago

Thanks for your prompt response! But I am concerned about two issues when training a model every 50 frames:

  1. Currently, the viewer is unable to continuously play six results. (may revise later)

  2. Will the transition between the last 50-frames-scene and the next 50-frames-scene not be smooth enough (compared to the overall optimization of 300 frames, will it be more prone to point flicker?). I always consider the initialization as important step for a consistent and better performance, so if I split the scene into six 50-frames sequence, should the initialization of each short sequence be the same(such as all use point_cloudtotal300.ply)?

lizhan17 commented 2 months ago
  1. Currently, the viewer is unable to continuously play six results. (may revise later) yes, will revise later

  2. there should be inconsistancy between each 50-frame sequences. but the inconsistancy exists in any multiple squences no mater the length 50 or 300

the Initialization of six 50-frames should like
0-50.ply 50-100.ply 100-150.ply .... this will align the points with videos for training efficiency and temporal stablity in 50-frame sequence.

yzxQH commented 3 weeks ago

Hi,thanks for your reply. I have been tested STG on N3V datasets for several times, but I found that the metrics are slightly different from the values in the paper. Take Cook_spinach as an example, the PSNR was about 30.95 on test views (lite model, training 50 frames), but the paper shows 31.5+. I used all default parameters, is this difference due to the carefully tuned for different scenarios? Or have some improvement strategies not mentioned in the paper been added? Or perhaps I overlooked some details?

lizhan17 commented 3 weeks ago

Hi, Do you use the config for Cook_spinach ? could you share the rendered image ?

yzxQH commented 3 weeks ago

Sure,here's my results, configpath='configs/n3d_lite/cook_spinach.json', This is the gt: 00000 And this is the render: 00000 (1) The metric on this view is: psnr:30.6890, ssim:0.9355,lpips:0.0811 the test iteration is 25000 by default. `

lizhan17 commented 3 weeks ago

1) did you use the provided code to generate point clouds ? 2) is the guided sampling triggled in this scene ? the two side seems to be poor quality.

yzxQH commented 3 weeks ago

1.Yes, I strictly followed the ReadMe instructions, here are all the cfg parameters: addsphpointsscale=0.8, basicfunction='gaussian', batch=2, checkpoint_iterations=[], compute_cov3D_python=False, configpath='configs/n3d_lite/cook_spinach.json', convert_SHs_python=False, data_device='cuda', debug=False, debug_from=-2, densification_interval=100, densify=1, densify_from_iter=500, densify_grad_threshold=0.0002, densify_until_iter=9000, desicnt=6, detect_anomaly=False, duration=50, emsstart=1600, emsthr=0.6, emstype=0, eval=True, farray=2, feature_lr=0.0025, featuret_lr=0.001, fzrotit=8001, gnumlimit=330000, gtisint8=0, gtmask=0, images='images', iterations=30000, lambda_dssim=0.2, loadall=0, loader='colmap', losstart=200, model='ours_lite', model_path='log/cook_spinach/colmap_0', movelr=3.5, omega_lr=0.0001, opacity_lr=0.05, opacity_reset_at=10000, opacity_reset_interval=3000, opthr=0.005, percent_dense=0.01, port=6029, position_lr_delay_mult=0.01, position_lr_final=1.6e-06, position_lr_init=0.00016, position_lr_max_steps=30000, preprocesspoints=3, prevpath='1', prunebysize=0, quiet=True, radials=10.0, randomfeature=0, rayends=7.5, raystart=0.7, rdpip='train_ours_lite', reg=0, regl=0.0001, removescale=5, resolution=2, rgb_lr=0.0001, rgbfunction='rgbv1', rotation_lr=0.001, save_iterations=[7000, 10000, 12000, 25000, 30000, 30000], saveemppoints=0, scaling_lr=0.0015, selectiveview=0, sh_degree=3, shrinkscale=2.0, shuffleems=1, source_path='Neural3D/cook_spinach/colmap_0', start_checkpoint=None, test_iterations=-1, trbfc_lr=0.0001, trbfs_lr=0.03, trbfslinit=0.0, veryrify_llff=0, white_background=False 2.And I found that the results of 50-100 frames are better than those of 1-50 frames, so may this be the reason? Is the result of 300 frames in the paper the average result of every 50 frames? 3.The default test iteration is 25000,but I found the results of 30000 is better than the 25000,so is the test iteration 30000 in the paper?

lizhan17 commented 3 weeks ago

reported results is average result of every 50 frames. What we used test iteraions should be 25000. But i know iterations should affect the result a lot. Perhaps 30000 is good for you. you can try to delete the pycache files and retrain model with "PYTHONDONTWRITEBYTECODE=1"