oppo-us-research / SpacetimeGaussians

[CVPR 2024] Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
https://oppo-us-research.github.io/SpacetimeGaussians-website/
Other
562 stars 40 forks source link

About Initialization #66

Open yzxQH opened 1 month ago

yzxQH commented 1 month ago

Great Work!Thanks for released code! But I still wonder why you use sparse point clouds from all available timestamps to initialize rather than only use timestamp=0 for initialization like most 4DGS related works do? And I have seen the dataset_readers, if I understood correctly, it seems that you just simply concatenate the sparse point clouds from all available timestamps, doesn't this make the initial point clouds overly redundant?

lizhan17 commented 4 weeks ago

1) our method inits from different timestamps. you can think it is like a video compression. video codec is better to have multiple frames than single image compression.

2) the initial point clouds is redundant if we only consider space from all timestamps, that is reason why we have temporal RBF(1d gaussian) to have a temporal opacity spacetime. we also have options to partially sample points from the input videos, like only use points from every N frames or use a neareast neibour in distance. (different group of picture in video codec)

yzxQH commented 2 weeks ago

Thanks a lot!I roughly understand now. But I have another question, can you explain in detail the meaning of the parameters trbfslinit , preprocesspoints , densify , desicnt in the techni_lite configs file? Under what circumstances would it be better to use which parameters? If I want to train on data exceeding 300 frames (not use every 50 frames training strategy), which parameter adjustments may be helpful?

lizhan17 commented 2 weeks ago

trbfslinit : control the shape of temporal RBF. preprocesspoints : reduce the initial points number (every N frames' points, or a spatial portion of points from each frame) densify: densification strategy. (main goal is adding points first then reducing points, differs at how to add and how to remove points) desicnt : number of densifications ( 6 or 12 times, most cases. i suggest 6)

for 300 frames, 1) the most important part should be use trbfslinit=4 with large value (so the initial temporal affecting range of each point will be small. 2) You can also use every 2 frames or every 4 frames points by setting the preprocesspoints=14 or 15 to avoid too many duplicating static points across time. (every 2 frames or every 4 frames)

elif self.preprocesspoints == 14:
        pcd = interpolate_partuse(pcd, 2) 
elif self.preprocesspoints == 15:
        pcd = interpolate_partuse(pcd, 4) 

3) I suggest that six 50-frames short squences should acchieve best results for 300 frames. we didn't optimize the training pipeline for 300 frames. (as we optimize all the points together during training.) duplicate points will cause artifacts across time.

yzxQH commented 2 weeks ago

Thanks for your prompt response! But I am concerned about two issues when training a model every 50 frames:

  1. Currently, the viewer is unable to continuously play six results. (may revise later)

  2. Will the transition between the last 50-frames-scene and the next 50-frames-scene not be smooth enough (compared to the overall optimization of 300 frames, will it be more prone to point flicker?). I always consider the initialization as important step for a consistent and better performance, so if I split the scene into six 50-frames sequence, should the initialization of each short sequence be the same(such as all use point_cloudtotal300.ply)?

lizhan17 commented 2 weeks ago
  1. Currently, the viewer is unable to continuously play six results. (may revise later) yes, will revise later

  2. there should be inconsistancy between each 50-frame sequences. but the inconsistancy exists in any multiple squences no mater the length 50 or 300

the Initialization of six 50-frames should like
0-50.ply 50-100.ply 100-150.ply .... this will align the points with videos for training efficiency and temporal stablity in 50-frame sequence.