oppo-us-research / SpacetimeGaussians

[CVPR 2024] Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
https://oppo-us-research.github.io/SpacetimeGaussians-website/
Other
616 stars 45 forks source link

Problem with custom dataset #55

Open lolwarmaze opened 5 months ago

lolwarmaze commented 5 months ago

Hi,

I am trying to train a 4D Gaussian model using my own set of stereo video pairs. I have two video files (.mp4) from my stereo camera setup and have generated a poses_bounds.npy file with LLFF format because I was unable to generate camera poses for a stereo camera setup using COLMAP. Now, when I try to preprocess it with pre_n3d.py script, it is able to extract features but I get the following error:

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:22<00:00, 11.13s/it]
start preparing colmap image input
start preparing colmap database input
I20240610 14:44:36.200973 140120778104832 misc.cc:198] 
==============================================================================
Feature extraction
==============================================================================
I20240610 14:44:36.372832 140120128700416 feature_extraction.cc:254] Processed file [1/2]
I20240610 14:44:36.372855 140120128700416 feature_extraction.cc:257]   Name:            cam02.png
I20240610 14:44:36.372858 140120128700416 feature_extraction.cc:283]   Dimensions:      1920 x 1080
I20240610 14:44:36.372861 140120128700416 feature_extraction.cc:286]   Camera:          #2 - PINHOLE
I20240610 14:44:36.372865 140120128700416 feature_extraction.cc:289]   Focal Length:    1061.01px
I20240610 14:44:36.372874 140120128700416 feature_extraction.cc:296]   GPS:             LAT=-0.002, LON=-0.120, ALT=-0.000
I20240610 14:44:36.372878 140120128700416 feature_extraction.cc:302]   Features:        8785
I20240610 14:44:36.447861 140120128700416 feature_extraction.cc:254] Processed file [2/2]
I20240610 14:44:36.447875 140120128700416 feature_extraction.cc:257]   Name:            cam01.png
I20240610 14:44:36.447878 140120128700416 feature_extraction.cc:283]   Dimensions:      1920 x 1080
I20240610 14:44:36.447880 140120128700416 feature_extraction.cc:286]   Camera:          #1 - PINHOLE
I20240610 14:44:36.447883 140120128700416 feature_extraction.cc:289]   Focal Length:    1061.01px
I20240610 14:44:36.447887 140120128700416 feature_extraction.cc:296]   GPS:             LAT=0.000, LON=0.000, ALT=0.000
I20240610 14:44:36.447890 140120128700416 feature_extraction.cc:302]   Features:        8352
I20240610 14:44:36.498595 140120778104832 timer.cc:91] Elapsed time: 0.005 [minutes]
I20240610 14:44:36.624848 139818958700544 misc.cc:198] 
==============================================================================
Exhaustive feature matching
==============================================================================
I20240610 14:44:36.707796 139818958700544 feature_matching.cc:231] Matching block [1/1, 1/1]
I20240610 14:44:36.717449 139818958700544 feature_matching.cc:46]  in 0.010s
I20240610 14:44:36.717592 139818958700544 timer.cc:91] Elapsed time: 0.002 [minutes]
I20240610 14:44:36.836042 139784402444288 misc.cc:198] 
==============================================================================
Loading model
==============================================================================
I20240610 14:44:36.836204 139784402444288 misc.cc:198] 
==============================================================================
Loading database
==============================================================================
I20240610 14:44:36.837077 139784402444288 database_cache.cc:54] Loading cameras...
I20240610 14:44:36.837100 139784402444288 database_cache.cc:64]  2 in 0.000s
I20240610 14:44:36.837107 139784402444288 database_cache.cc:72] Loading matches...
I20240610 14:44:36.837137 139784402444288 database_cache.cc:78]  1 in 0.000s
I20240610 14:44:36.837142 139784402444288 database_cache.cc:94] Loading images...
I20240610 14:44:36.837882 139784402444288 database_cache.cc:143]  2 in 0.001s (connected 2)
I20240610 14:44:36.837891 139784402444288 database_cache.cc:154] Building correspondence graph...
I20240610 14:44:36.838084 139784402444288 database_cache.cc:190]  in 0.000s (ignored 0)
I20240610 14:44:36.838155 139784402444288 timer.cc:91] Elapsed time: 0.000 [minutes]
I20240610 14:44:36.839397 139784402444288 misc.cc:198] 
==============================================================================
Triangulating image #1 (0)
==============================================================================
I20240610 14:44:36.839512 139784402444288 sfm.cc:473] => Image sees 0 / 886 points
I20240610 14:44:36.839772 139784402444288 sfm.cc:478] => Triangulated 0 points
I20240610 14:44:36.839805 139784402444288 misc.cc:198] 
==============================================================================
Triangulating image #2 (1)
==============================================================================
I20240610 14:44:36.839898 139784402444288 sfm.cc:473] => Image sees 0 / 886 points
I20240610 14:44:36.840165 139784402444288 sfm.cc:478] => Triangulated 0 points
I20240610 14:44:36.840195 139784402444288 misc.cc:198] 
==============================================================================
Retriangulation
==============================================================================
I20240610 14:44:36.840288 139784402444288 incremental_mapper.cc:175] => Completed observations: 0
I20240610 14:44:36.840315 139784402444288 incremental_mapper.cc:178] => Merged observations: 0
I20240610 14:44:36.840369 139784402444288 misc.cc:198] 
==============================================================================
Bundle adjustment
==============================================================================
F20240610 14:44:36.840494 139784402444288 sfm.cc:514] Check failed: bundle_adjuster.Solve(reconstruction.get()) 
    @     0x7f221f38a78a  google::LogMessage::Fail()
    @     0x7f221f38bf45  google::LogMessageFatal::~LogMessageFatal()
    @     0x5646cb866291  colmap::RunPointTriangulatorImpl(std::shared_ptr<colmap::Reconstruction> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&,�?
    @     0x5646cb866786  colmap::RunPointTriangulator(int, char**)
    @     0x5646cb846685  main
    @     0x7f221d9fc083  __libc_start_main
    @     0x5646cb84cfa4  (unknown)
Aborted (core dumped)

I also checked my poses orientation and they look okay. What do you think might be the issue here ? Also if someone has any guideline for working with custom stereo datasets then please let me know. Thanks!

lizhan17 commented 5 months ago

Thank you for your interesting. it seems there is no point output from colmap.

i am working on a custom data processing, loader and trainer. Stay tuned, in the next two weeks.

lizhan17 commented 5 months ago

Hi, i just pushed code to process raw static-placed multi-camera captured mp4 videos without any prior. (current custom processing can not handle fisheye. which needs prior intrics). Please allow for some artifacts in reconstructed models.

  1. code to process videos (suppose your videos is at /data/videos/*.mp4)

    python script/pre_no_prior.py --videosdir /data/videos

    colmap points will be generated at /data/videos/point/colmap_* you can remove all the intermediate results images in /data/videos/frames after you can get good colmap points

  2. code to train model

 python train.py --eval --config configs/techni_lite/noprior.json --model_path <path to save model> --source_path /data/videos/point/colmap_0

add options for memory effient training (at cost of longer training time, larger cpu memory requirment)

--data_device cpu --gtisint8 1 

after the two options, it only requires 4~5GB gpu memory for 18 cameras each with 50 frames (orginal 2 k resolution downsample to 1k resolution images)

TonyDua commented 3 months ago

I tested on the 170915_toddler5 dataset in the CMU Panoptic Studio dataset, the result after training with 31 HD Cams was very poor,it is almost impossible to observe any valid content; with 479 VGA Cams, the result was also unsatisfactory. For this 360° video dataset, how can we improve the quality? Are there any hyperparameters that can be adjusted? image 31 HD Cams Result 👆 image image 479 VGA Cams Result 👆

lizhan17 commented 3 months ago

hyperparameters like "--trbfslinit 4" (this will make the initial points cover fewer neighbouring time.) "--trbfs_lr 0.05" will increase the changing of 1d gaussian shape.

did you use pose priors in this dataset?

TonyDua commented 3 months ago

No. I did not use pose priors in this dataset. I used the script/pre_no_prior.py mentioned in your reply above for preprocessing. I will try training with the hyperparameters you mentioned later.

aleatorydialogue commented 2 months ago

Thank you very much for adding this method for custom data, I am getting a lot better results than I was using LLFF to get poses_bounds.npy and n3d dataloader. Appreciate your work

BIT-DYN commented 3 weeks ago

No. I did not use pose priors in this dataset. I used the script/pre_no_prior.py mentioned in your reply above for preprocessing. I will try training with the hyperparameters you mentioned later.

Hi, @TonyDua Did you get good results on the CMU Panoptic Studio dataset, I tried but failed.