oppo-us-research / SpacetimeGaussians

[CVPR 2024] Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
https://oppo-us-research.github.io/SpacetimeGaussians-website/
Other
568 stars 42 forks source link

Some questions about custom dataset ? #26

Open pange1802703882 opened 7 months ago

pange1802703882 commented 7 months ago

Thanks for your excellent work! I have some questions about generate custom dataset, I hope you can answer my questions :) In order to refer the format of the n3d dataset, I have used two data recording methods

  1. Only use one realsense camera and shoot a scene around a circle
  2. Using multiple fixed angle cameras for shooting a scene, and select first frame from every camera videos And then I use https://github.com/Fyusion/LLFF/blob/master/imgs2poses.py method to generate "poses_bounds.npy" like n3d dataset. But I have been unable to generate image poses, it seems that the generated pose is far less than the number of images。 So I would like to ask if this data generation method is correct ? And what other factors need to be considered if using realsense cameras ?
lizhan17 commented 7 months ago

I don't know realsense camera. I prefer (2). Results really depends on the quality of colmap's reconstruction of sparse models.

poses_bounds.npy is not necessary. We use pose_bounds just because the n3dv dataset provides it with llff format.

you can directly input poses of colmap with first frame to tri-angle points for other time. you modify the convertmodel2dbfiles in pre_technicolor for inputing the first frame's pose directly from its sparse model.

Please use technicolor loader for custom dataset. as it does not need posebounds.npy and will handle uncentered images.

factors for results also see https://github.com/oppo-us-research/SpacetimeGaussians/issues/22#issuecomment-1960727292

You can remove the code of camera ray dictionary (I share the static camera rays cross different time) if you only use lite model.

I recommend that you can first go through the whole pipeline of colmap with gui then you will know how to process the poses and camera models.

pange1802703882 commented 7 months ago

Also, could you provide the technicolor dataset ? I download the n3d and immersive dataset. But the technicolor's author has not replied to any messages :)

pange1802703882 commented 7 months ago

I note the technicolor's dataset camera parameters is "cameras_parameters.txt", is this also the first frame pose of the camera ?

lizhan17 commented 7 months ago

no, i cannot provide technicolor without the permision of author. you can still use n3d as reference. (without using pose bounds and directly reconstruct the images to get sparse model data) from gui

please see these documentations for pose.

https://colmap.github.io/faq.html#faq-share-intrinsics https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses https://colmap.github.io/format.html#output-format

cameras_parameters.txt contains a row of parameters.
i used them like

colmapQ = [row[5], row[6], row[7], row[8]] colmapT = [row[9], row[10], row[11]]

pange1802703882 commented 7 months ago

OK, Thank you !!!

pange1802703882 commented 7 months ago

Sorry, I have a new question while reading the code. In class Camera, what is the rule of the rayo and rayd ?

lizhan17 commented 7 months ago

ray o is the start of the ray, which is camera center. rayd is the direction of the ray, start from camera center to each pixel center.

pange1802703882 commented 6 months ago

Sorry, I have a new question while reading the code :). What is the role of flags in the training process ? And the role of the step "guided sampling step" ? Thank you !!!

lizhan17 commented 6 months ago

flags is for the guided sampling step. (triggling, controling, stopping). This is experiemntally implemented. Role of the guided sampling step is written in the paper. Generally speaking, we want to improve result by adding guided sampling step for areas that are hard to converge by only densifying. (densifying in orginal is a surface-level growing points (see implementation of splitting, cloning).

davidmaruscsak00 commented 6 months ago

Hey ! Thank you for this amazing work.

I am interested whether the reconstruction with custom dataset succeeded?

I have a question. How should I modify the convertmodel2dbfiles in the technicolor file to process the first frame's pose directly from its sparse model. In that case do I still have to have a cameras_parameters.txt file?

lizhan17 commented 6 months ago

1) you can use read_extrinsics_binary() read_intrinsics_binary() (to the loaders'code) to get the extrinsics and intrinsics of a reference sparse model.

2) then refer to the loaders'code (see how n3d,technicolor loader, read colmap model use read_extrinsics_binary() read_intrinsics_binary(). they are same, just focus on the colmap model part).

3) you can replace the parameters in cameras_parameters.txt with a reference model's parameter read by step1 and step2 and input them to the db file.

4) use the reference model's paramters to triangulate the points, see what we do in the last part of each script to process datasets.

aleatorydialogue commented 5 months ago

I've been trying to adapt this for monocular custom datasets, but having trouble. Am I understanding correctly that triangulation is being done for each Frame Individually? So if only using camera, triangulation fails due to having nothing to compare for each frame? Really enjoying this work, appreciate it

lizhan17 commented 5 months ago

monocular setting is under working. It can be seen as a future work. you can leverage existing method like 4DGS and deformable 3dgs to get points.

There will be parallax error with a single still camera setting you need additional computation for that.

aleatorydialogue commented 4 months ago

I am back to try and run my own data and thought this would be a reasonable thread to continue it from. It seems the easiest way to do my own custom data is n3d preprocessing. So I am taking a few different camera mp4s, extracting frames, and running LLFF imgs2poses.py on directory of all of the frames combined. I will then divide back into different camera directories with the frames and run modified pre-n3d.py to skip frame extraction and triangulate on what i already extracted. My question is whether the poses_bound.py produced by LLFF imgs2poses.py will be the correct format. Does the numpy data need to be organized specifically to show which frames are from which camera? Or do you just have all sequentially listed and dictate the break based on how many frames specified when running pre_n3d.py?

I appreciate your work here, very excited to get my own videos working and display with splatV web viewer.

aleatorydialogue commented 4 months ago

and actually I guess the cameras have to be static? So my poses_bounds.npy will have the same amount of poses as I have cameras?

lizhan17 commented 4 months ago
Does the numpy data need to be organized specifically to show which frames are from which camera? Or do you just have all sequentially listed and dictate the break based on how many frames specified when running pre_n3d.py?

if you use n3d loader with llff poses and pren3d script, yes. i remmeber n3d llff posebounds is ordered by camera. you can check n3d dataset's posebounds numpy shape with what you can extracted from imgs2poses.

if you use poses_bounds.npy with n3d loader, it should have same amount of poses as cameras. yes pose should be static.

although static camera is not a must for our method.

aleatorydialogue commented 4 months ago

awesome, definitely making some progress here at least. Able to train a static scene with 4 different mp4s from different static angles, basically just tricking it to think its a multicam dynamic scene. Next step will be to get a real camera rig to try for actual dynamic of my own. Would one of the other loaders be better for non-static camera? Or would need to adapt loader of my own? I am more used to basic GS static splats, where I capture with single moving camera. I understand why multiple moving cameras would be more complex to process, ultimately i envision using multiple drones for capture.

lizhan17 commented 4 months ago

i don't think exsiting loader will work for non-static camera wihout modification. (i share the rays among same cameras to reduce memory) also the inilizaiton time of points is assuming the we have synchroized videos for each camera.

the sampling in the training batch is assuming that each time stamp, we can randomly pick several cameras.

aleatorydialogue commented 4 months ago

Well thank you for your help here. I managed to train my own basic custom data, 4 cameras, front facing scene. Lots of progress to be made with quality but using LLFF for poses and n3d data loader definitely works.

ShunyuanZheng commented 2 months ago

Also, could you provide the technicolor dataset ? I download the n3d and immersive dataset. But the technicolor's author has not replied to any messages :)

Hi, I have the same question. Have you got access to the Technicolor dataset? Which author should I contact? Could you please share the email address for applying for the Technicolor dataset? @lizhan17

lizhan17 commented 2 months ago

emai is

lf4cvmanagement at interdigital.com

you can find it on their website.

ShunyuanZheng commented 2 months ago

emai is

lf4cvmanagement at interdigital.com

you can find it on their website.

Thanks!