nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
8.87k stars 1.18k forks source link

Georeferenced datasets processed on Metashape are loaded incorrectly/can't be loaded using splatfacto. #3255

Open gaigc opened 4 days ago

gaigc commented 4 days ago

Describe the bug I've noticed that, when trying to use a dataset that I've aligned using GPS reference in Metashape, it will not load, or it will load but produce no results. This has been an issue since nerfstudio implemented loading in point clouds (.ply) for splat seeding.

I thought that @simonbethke might have reported this problem when it first was being tested in pull #3122. When @jb-ye asked for sample data, I assumed that they had shared that somewhere, but I couldn't find any issue related to this, so I'm making one here.

On older (month ago) nerfstudio and gsplat versions, I was getting error: Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

On the ODM_mygla dataset on current versions, I'm getting this error after ~1000its: CONSOLE.log(f"Splitting {split_mask.sum().item()/self.num_points} gaussians: {n_splits}/{self.num_points}") ZeroDivisionError: division by zero

If I do process the data without the gps reference, it does load correctly.

I'm able to use this data when exporting using the gaussian splatting metashape script on inria's and postshot implementation of gaussian splatting. The data is exported using colmap format.

Why use GPS reference

I've found that using GPS reference in metashape helps with alignment (speed and accuracy), and usually results in a consistent scale, orientation, and ground plane.

I'm not asking to somehow implement GPS data in splats/nerfstudio-data, but to simply accept data created with it.

To Reproduce My workflow usually consists of:

  1. importing dataset(s) into metashape in different chunks. GPS data is auto selected in the reference panel from exif.
  2. Align images, usually with high/highest details, and high number of keypoints and unlimited tiepoints.
  3. Export camera positions and sparse/tie pointcloud using a batch job (Also tried manually)
  4. Process data into nerfstudio using the following command: ns-process-data metashape --data "e:/3D Datasets/[Dataset Name]" --xml "e:/3D Datasets/[Dataset Name]/db.xml" --ply "e:/3D Datasets/[Dataset Name]/PointCloud.ply" --output-dir e:/NerfStudio/data/[Dataset Name]/out
  5. Run Splatfacto with ns-train splatfacto --output-dir ./outputs/ODMlogs nerfstudio-data --data ./data/ODMlogs/out
  6. After undistorting it either loads incorrectly or give an error

I've tried exporting camera positions/point cloud with local coordinates and wgs84 as the coordinate system, both give the same error.

Expected behavior To be able to load the data that has been georeferenced.

Alternative solution Using the colmap export script data to create the nerfstudio data, but this will only be an option with people who have Metashape pro, and in my opinion a workaround (maybe still useful for using data used in other programs).

I tried to simply copy and paste this data into the colmap structure in nerfstudio, but I failed. I'm sure that there is a command/script to convert this into something that nerfstudio can run, but I wasn't able to get something that works.

Here is how the folder is formatted in case it helps:

[Dataset]/ ├─ node_modules/ ├─ images/ │ ├─ Image1.jpg │ ├─ Image2.jpg │ ├─ Image3.jpg ├─ sparse/ │ ├─ 0/ │ │ ├─ cameras.bin │ │ ├─ images.bin │ │ ├─ points3D.bin

Screenshots ODM_mygla processed using splatfacto georeferenced before it crashes: image The only notable detail is a small white dot at the bottom of the scene at possibly infinite distance. ODM_helenenschacht also displays similar results, but doesn't crash. Camera sometimes are obscured by the scene, so I need to disable composite depth to use the camera positions as reference.

ODM_mygla processed using nerfacto on same dataset: image

Additional context Machines specs and info: Pc 1- R9 5900X, 128gb ram, 3060 12gb, data on SSD. Windows 10, anaconda, Nerfstudio 1.1.2, Gsplat 1.0.0 Pc 2- i7 7700HQ, 16gb ram, 1060 6gb, data on SSD. Windows 10, anaconda, Nerfstudio 1.1.0, Gsplat 0.1.12 All latest nvidia drivers, also tested with drivers from 4 months ago. (I'm aware that Pc 2 won't be able to run any future gsplats, still included this info since the old version was giving different errors that might help narrowing down the problem)

Data processed using Metashape 2.x

I capture my own data using a Mavic Air 2s, it often aligns a few meters below ground, but even when adjusting for that, there are errors. I'm not sure how to share my own dataset, so here are some datasets that I've tested and display the same errors:

Datasets for reference: ODM_mygla 41 images ~5mb each. Captured on DJI phantom 3

Here are the export files as .txt, you'll need to change the extension: db.xml.txt PointCloud.ply.txt

ODM_helenenschacht 176 images ~12mb each. Captured on Autel Evo II Pro RTK I've also tested, but pointcloud is too big. Here is the camera positions: db.xml.txt

This is my first issue submitted, so apologies for any missing info, or bad etiquette.

Dump of Logs

Console error from trying to run splatfacto with ODM_mygla dataset on nerfstudio 1.1.2 with PC 1

890 (2.97%)         8.359 ms             4 m, 3 s             78.09 M
----------------------------------------------------------------------------------------------------   splatfacto.py:
Viewer running locally at: http://localhost:7007 (listening on 0.0.0.0)
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 0.0155
VanillaPipeline.get_train_loss_dict: 0.0118
Traceback (most recent call last):
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\[User]\.conda\envs\nerfstudio\Scripts\ns-train.exe\__main__.py", line 7, in <module>
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 262, in entrypoint
    main(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 247, in main
    launch(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 100, in train_loop
    trainer.train()
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 265, in train
    callback.run_callback_at_location(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\callbacks.py", line 115, in run_callback_at_location
    self.run_callback(step=step)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\callbacks.py", line 100, in run_callback
    self.func(*self.args, **self.kwargs, step=step)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\models\splatfacto.py", line 456, in refinement_after
    split_params = self.split_gaussians(splits, nsamps)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\models\splatfacto.py", line 543, in split_gaussians
    CONSOLE.log(f"Splitting {split_mask.sum().item()/self.num_points} gaussians: {n_splits}/{self.num_points}")
ZeroDivisionError: division by zero 

Console error from trying to run splatfacto with nerfstudio 1.1.0 with PC 2

 [17:24:14] Caching / undistorting train images                                            full_images_datamanager.py:183
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 8.2725
VanillaPipeline.get_train_loss_dict: 8.2715
Traceback (most recent call last):
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\[User]\.conda\envs\nerfstudio\Scripts\ns-train.exe\__main__.py", line 7, in <module>
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 262, in entrypoint
    main(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 247, in main
    launch(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 100, in train_loop
    trainer.train()
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 261, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\utils\profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 498, in train_iteration
    self.grad_scaler.scale(loss).backward()  # type: ignore
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\torch\_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "C:\Users\[User]\.conda\envs\nerfstudio\lib\site-packages\torch\autograd\__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn