Splatfacto a pointcloud for initialization not from COLMAP

piorkoo commented 8 months ago

Describe the bug

I am trying to train using splatfacto method and use pointcloud exported from metashape for initialization. I've added to the transform.json "ply_file_path" and exported pointcloud as .ply file. I have all the time the same error: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn I have tried in different ways to export/convert .ply file. Using for convertion cloudcompare and pdal library. When I use ns-process-data images where the pointclod is made by colmap, there is no problem. The header of .ply from colmap and .ply file exported from metashape (after convertion) looks the same. When I copy only the pointcloud made by colmap and train dataset from metashepe it works (of course the pointcloud has different coord that cameras).

When using pointcloud in .ply format made not from COLMAP (tested metashape, cloudcompare, pdal, notepad) after 0-100 steps error occurs:

Traceback (most recent call last):
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\admin\miniconda3\envs\nerfstudio\Scripts\ns-train.exe\__main__.py", line 7, in <module>
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 262, in entrypoint
    main(
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 247, in main
    launch(
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 100, in train_loop
    trainer.train()
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 250, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\site-packages\nerfstudio\utils\profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 473, in train_iteration
    self.grad_scaler.scale(loss).backward()  # type: ignore
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\site-packages\torch\_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "C:\Users\admin\miniconda3\envs\nerfstudio\lib\site-packages\torch\autograd\__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Two point cloud are attached: data.zip

kerrj commented 8 months ago

Interesting, does initializing gaussians randomly also work normally? Something to try would be launching training while paused, then inspect the initialized gaussians in the viewer to see if they line up with the input cameras or if anything looks off visually.

piorkoo commented 8 months ago

yes, initializing gaussians randomly works normally. I paused training and looks like coordinates of the point cloud are swapped, rotated and scaled. I wonder what matrix parameters and where to use it to rotate pointcloud and why transform.json has rotated cameras?

kerrj commented 8 months ago

transform.json can sometimes automatically apply a dataparser transform to shift and scale the poses, which is saved as 'applied_transform' in the json. Maybe if this pose is not identity, you might need to transform the points by the same matrix? I'm not too familiar with the dataparser transform code though, you might want to dig into the code around this, starting here

piorkoo commented 8 months ago

thank you, unfortunately in tested dataset in transforms.json there is no applied_transform. Looks like the problem with the pointcloud initialization is due to the coordinates not the format/exporter.

H-tr commented 7 months ago

Hi! same issue here. I tried using colmap without initial pointcloud and it worked well. However, when using either polycam or spectacularAI generated rgb and depth, this issue will appear. Any solutions?

gchhablani commented 7 months ago

Hi! I face the same issue but with images from a known pointcloud. With random initialization, the train script fails with this error.

oseiskar commented 7 months ago

I do not remember seeing this particular error, but the latest released Spectacular AI SDK does not yet produce the expected sparse_pc.ply file. This will be fixed in the next SDK release later this week. As a work-around, it is possible to use the latest development version as described here https://github.com/SpectacularAI/sdk-examples/issues/126#issuecomment-1934172987

It may also be relevant for this issue that when Nerfstudio (after promting the user) tries to create sparse_pc.ply from a colmap/ folder, this operation does not look correct. The resulting point cloud has a coordinate system flip (at least the Z-axis was flipped). The incorrect point cloud often kind-of works too, but the splatting converges a lot slower. If we simply write the same point cloud that we would have written as .csv/.txt to mimic COLMAP output, but now as ASCII PLY, the Splatfacto method seems to be working fine (see here).

We also have a separate repo for tools that can convert between a few different popular point cloud formats (including COLMAP txt -> PLY), which may also be helpful debugging this https://github.com/SpectacularAI/point-cloud-tools.

yoyoooooooo commented 6 months ago

Hi. I had the same error but in my case ply headers were different: double and uchar from Metashape, and float and uint8 from Colmap. I simply changed the header of the metashape ply and it worked.

Kiord commented 6 months ago

Hi, same error here, and it persists when fixing the types in the ply header. Using the random initialization works.

The following is surprising : I saved the optimized splats with random initialization as a ply file, cleaned it so it only contains points on the object, and reinjected it as initialization points. It appears that the issue persists.

I used Metashape to make the camera alignment and ns-process-data to create the transform.json from the Agisoft xml.

yoyoooooooo commented 6 months ago

It also seems that in the ply exported from Metashape sometimes y and z fields are swapped and cause this same error. Reversing them back seems to fix again. The sometimes is confusing; I probably miss something sometimes :-)

jb-ye commented 6 months ago

Improving compatibility with other SfM softwares is not an easy work. Different softwares maintain different convention of coordinates, if the poses feed to nerfstudio are in different coordinates than the point cloud, unexpected error would occur.

simonbethke commented 5 months ago

I am checking this issue at the moment because I want to use the pointcloud from metashape and my impression is, that due to unknown transformations, sometimes there are cameras with no points in view. This then could cause the crash to happen. I have switched the axis of the ply file in all possible combinations and was not able to get the correct orientation. However, in two combinations, the system didn't crash and continued training with wrong starting points (like in random initialization)

pyjacques commented 5 months ago

Hello, in metashape, first of all, you need align up you scene, and place all of it above z=0 (very important to avoid training crashes)! Then, export cameras.xml and tie cloud or dense cloud. Run nerfstudio dataset conversion from metashape. Before training, you need to flip ply file. I'm using this python/open3D :

import open3d as o3d
import numpy as np
cloud = o3d.io.read_point_cloud("spc.ply")
R1 = cloud.get_rotation_matrix_from_xyz((-np.pi / -2, 0, 0))
cloud.rotate(R1, center=(0, 0, 0))
R2 = cloud.get_rotation_matrix_from_xyz((0, 0, np.pi / 2))
cloud.rotate(R2, center=(0, 0, 0))
o3d.io.write_point_cloud("spc_fliped.ply", cloud)

Then add this line to transforms.json before "camera_model" :

"ply_file_path": "spc_fliped.ply",
"camera_model": "OPENCV",

You can toogle visibility of initial cloud of nerfstudio webUI 😉 Wish you the best renders !

simonbethke commented 5 months ago

What do you mean by 'align up the scene'? Regarding the python code to update the ply file, I think that this should get integrated into the nerfstudio-metashape importer. Maybe also the alignment can be fixed this way? Ideally, the nerfstudio importer would provide an option to define a metashape-ply-file and would also convert and move it to the bundle folder.

pyjacques commented 5 months ago

In fact, I just want to clarify that after importing and aligning the images, it's important to position the object or scene above the horizon line and logically in its natural orientation. I've noticed that nerfstudio often crashes when camera orientations are pointed towards an object below the horizon line (Z=0).

I believe this 'positioning 'step should remain manual in order to have a coherent scene in Metashape. However, if we start adjusting the position of the point cloud in converter/dataparser, we will inevitably have to adjust the cameras as well~~

On the other hand, it would be wise to offer the option to export from Metashape, in addition to the cameras.xml file, the point cloud (sparse/tie point) .ply file, to be processed in 'ns-process-data metashape --data...' by making this corrective rotation in space, with the python script above for exemple.

I obtain much better results with metashape tie point cloud than with a nerfstudio random initialization. However, The dense point cloud often provides too much information, resulting in a visually less appealing outcome and much heavier splat scenes. For me, the advantage of gaussian splatting is to have light and optimized scenes and still offer a photorealistic rendering on low-end devices.

simonbethke commented 5 months ago

Thank you very much for this help! I will contribute this to the metashape processing tool so everybody can simply take a ply-export from metashape

simonbethke commented 5 months ago

@pyjacques this code is now part of nerfstudio. You can simply add the optional parameter - -ply to ns-process-data metashape...

pyjacques commented 5 months ago

@simonbethke Whaoo, thank you for implementing this into nerfstudio ! I'm Happy to contribute too 😉

jb-ye commented 5 months ago

closing per https://github.com/nerfstudio-project/nerfstudio/pull/3122

nerfstudio-project / nerfstudio

Splatfacto a pointcloud for initialization not from COLMAP #2876