Using an alternative to COLMAP

henrypearce4D commented 1 year ago

Hi, datasets I am testing do not align well in colmap, is there an alternative format?

grgkopanas commented 1 year ago

Unfortunately, we currently only support Colmap and json files similar to nerf-synthetic dataset, but if you have the time and energy we would love to help you create a pull request that supports new formats.

henrypearce4D commented 1 year ago

Hi George, can you send a link to an exmaple of the json file similar to nerf-synthetic dataset please?

would this be a dataset typically with;

transforms.json
/images

Or something else?

grgkopanas commented 1 year ago

You can download them here:

https://www.matthewtancik.com/nerf

also the code for loading such files is here: https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/dataset_readers.py#L179 and for colmap files is here: https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/dataset_readers.py#L68

On Mon, Jul 10, 2023 at 11:34 AM henrypearce4D @.***> wrote:

Hi George, can you send a link to an exmaple of the json file similar to nerf-synthetic dataset please?

— Reply to this email directly, view it on GitHub https://github.com/graphdeco-inria/gaussian-splatting/issues/7#issuecomment-1629492784, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGXXYM2FEAH6CGHTW7GRYLXPRDL5ANCNFSM6AAAAAA2E5J7AM . You are receiving this because you commented.Message ID: @.***>

Snosixtyboo commented 1 year ago

Hi George, can you send a link to an exmaple of the json file similar to nerf-synthetic dataset please?

would this be a dataset typically with;
transforms.json
/images
Or something else?

Please note that one would have to adapt the json loading code to support general scenes. Right now it assumes the synthetic NeRF blender data set with 800x800 pixel resolution. I'll see if i can bump that for general transforms.json later today.

Snosixtyboo commented 1 year ago

Hi George, can you send a link to an exmaple of the json file similar to nerf-synthetic dataset please? would this be a dataset typically with;
transforms.json
/images
Or something else?
Please note that one would have to adapt the json loading code to support general scenes. Right now it assumes the synthetic NeRF blender data set with 800x800 pixel resolution. I'll see if i can bump that for general transforms.json later today.

But the remaining problem with transforms.json would still be that there is agreed-upon way to reference input geometry (i.e., a point cloud), which we require for initialization. Adding support for Reality Capture would be better, but some effort. We have to check if we can find the time to do this.

henrypearce4D commented 1 year ago

Great thankyou both.

The drums dataset is now training, it looks fantastic!

From looking at the transforms.json there is no camera intrinsics so I presume these data sets are rendered as 'pinhole' camera model type, for my own dataset I can test exporting undistorted images and remove the intrinsics from the transforms.json.

Snosixtyboo commented 1 year ago

Great thankyou both.

I can generate this .json from converting a metashape camera.xml, and can also export a scene camera matching .ply. Could this file be referenced if placed in the dataset folder and referenced in some way if there is an agreed open naming convention?

The drums dataset is now training, it looks fantastic! the set doesn't have a point cloud file, is this not required for the synthetic blender datasets?

From looking at the transforms.json there is no camera intrinsics so I presume these data sets are rendered as 'pinhole' camera model type, for my own dataset I can test exporting undistorted images and remove the intrinsics from the transforms.json.

The blender dataset is simple enough that we can train it from random points, this will not work as well for arbitrary scenes (see the paper). Combining transforms and a local .ply file could work. One would have to remove the hard assumption of 800x800 pixels, read minimal intrinsic (fov) from the transforms.json and look for a .ply in the local directory. Basically a combination of the methods that are already in the code base. I can check if I find the time to do this, but I'm not sure unfortunately.

henrypearce4D commented 1 year ago

Hi @Snosixtyboo

Were you able to look at adapting the code to support loading general scenes today?

Also can you confirm if other camera models will be supported other than pinhole?

henrypearce4D commented 1 year ago

Hi @Snosixtyboo, just bumping this question.

Snosixtyboo commented 1 year ago

Hi,

we are looking at all issues in order of severity, and unfortunately, this isn't highest right now so it could take a while if we get round to it. But cameras other than pinhole will likely not be supported due to the fact that our code is rasterization based, which means that advanced lens distortions are not easily realized. This would be much easier in any ray-based (NeRF) approach, but I don't know what MipNeRF360 or InstantNGPs support for arbitrary cameras is.

henrypearce4D commented 1 year ago

Ok thankyou, if general transforms will not be looked at soon I will continue investigating other methods.

BennetLeff commented 1 year ago

@grgkopanas Since you said "but if you have the time and energy we would love to help you create a pull request that supports new formats" I took a shot at it but it didn't work at all, haha!

My first question before any follow up is: Is it important to calculate normals for each point? The polycam ply implementation that I'm trying to import doesn't have them.

Thanks for everything! Any help fixing this hot-mess of Gaussian clouds would be great :)

grgkopanas commented 1 year ago

Ok thankyou, if general transforms will not be looked at soon I will continue investigating other methods.

Other camera models can be supported during capture by letting COLMAP undistort the images by during calibration. Supporting other camera models in rendering can be challenging when you do rasterization.

grgkopanas commented 1 year ago

@grgkopanas Since you said "but if you have the time and energy we would love to help you create a pull request that supports new formats" I took a shot at it but it didn't work at all, haha!

My first question before any follow up is: Is it important to calculate normals for each point? The polycam ply implementation that I'm trying to import doesn't have them.

One of the properties of our method is that exactly it doesnt need normals for any initial point clouds. I am not sure where not having normals appeared as a problem for you, can you please elaborate more?

Thanks for everything! Any help fixing this hot-mess of Gaussian clouds would be great :)

BennetLeff commented 1 year ago

While I could initialize my point cloud randomly to start, I have a point cloud from my polycam capture that I imagine will aid the training. Based on my reading of the COLMAP datareader which I'm basing my datareader on I've run into the code here in fetch_ply.

In my implementation I've had to set normals to a zero vector just to get the code to run but I can't imagine that's helpful.

what it's worth, I've gotten renderings that do 'match' my scene but with lots of artifacts. My guess is that some of the artifacts are related to the 'densification' process because my pointcloud goes from 34.5 MB to ~240 MB with many more points added but that's a complete guess. Additionally, rendering the initial PLY file and the PLY generated by the trained model shows an abundance of points placed outside the scene (outside the room I captured in).

I'd be happy to share my code (preferably privately until it's suitable for more public comment). I think I've made a notable step towards importing of alternative data!

grgkopanas commented 1 year ago

Normals in this case are completely irrelevant, you did well to just set them to zero vectors. Often the quality of the results is correlated with how a scene is captured, unfortunately the optimization can get stuck in weird solutions similar to NeRF or other methods, if your results don't look good it's probably because the algorithm could not find a good solution given the cameras that were provided as input.

Most often the capture styles that work best are object-oriented where you go around an object and capture images from all angles. Take a look as an example how mipNeRF360 dataset was captured.

Best, George

On Mon, Aug 21, 2023 at 8:04 AM Bennet Leff @.***> wrote:

While I could initialize my point cloud randomly to start, I have a point cloud from my polycam capture that I imagine will aid the training. Based on my reading of the COLMAP datareader which I'm basing my datareader on I've run into the code here https://github.com/graphdeco-inria/gaussian-splatting/blob/f7a116fb1397d9842239127d39dc212f93171f70/scene/dataset_readers.py#L112 in fetch_ply. In my implementation I've had to set normals to a zero vector just to get the code to run but I can't imagine that's helpful. For what it's worth, I've gotten renderings that do 'match' my scene but with lots of artifacts. My guess is that some of the artifacts are related to the 'densification' process because my pointcloud goes from 34.5 MB to ~240 MB with many more points added but that's a complete guess. Additionally, rendering the initial PLY file and the PLY generated by the trained model shows an abundance of points placed outside the scene (outside the room I captured in).

— Reply to this email directly, view it on GitHub https://github.com/graphdeco-inria/gaussian-splatting/issues/7#issuecomment-1686503763, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGXXYLFWGGEWWILOICQ3G3XWN2GXANCNFSM6AAAAAA2E5J7AM . You are receiving this because you were mentioned.Message ID: @.***>

BennetLeff commented 1 year ago

I'll start with some smaller examples to test then, thanks!

BennetLeff commented 1 year ago

Here's an example I grabbed on my desk of my airpods case. In the SIBR viewer it seems like I've parsed some data wrong but I don't understand how. Any insight?

output.webm

grgkopanas commented 1 year ago

Depending on how comfortable you are with coding you might want to try to add this function in the "Scene" Object to try to visualize the point cloud and the cameras.

This will not work out of the box because its a very old implementation but you can get a rough idea or try to implement your own visualizer.

def debugView(self, scale=1.0):
        import polyscope as ps

        ps.init()

        ps_pc_diffuse = ps.register_point_cloud("point_cloud_diffuse", self.diffuse_point_cloud.global_xyz.detach().cpu().numpy(), enabled=True, radius=0.00007)
        ps_pc_diffuse.add_color_quantity("point_cloud_colors", self.diffuse_point_cloud.global_features[:,:3].detach().cpu().numpy(), enabled=True)

        cam_poses = np.array([x.camera_center.cpu().detach().numpy() for x in self.getTrainCameras(scale)]).squeeze()
        ps_cameras = ps.register_point_cloud("cameras", cam_poses, enabled=True)

        dirs = [geom_transform_vectors(torch.tensor([[0.0, 0.0, 1.0]]), x.world_view_transform.inverse().cpu()).cpu().detach().numpy() for x in self.getTrainCameras(scale)]
        ps_cameras.add_vector_quantity("cam_dirs", np.array(dirs).squeeze(), enabled=True)

        """
        poi = self.findCenterOfInterest(scale)
        print(poi)
        ps_poi = ps.register_point_cloud("point_of_interest", poi, enabled=True)
        """

        ps_x = ps.register_point_cloud("origin_x", np.array([[0., 0., 0.]]), enabled=True)
        ps_y = ps.register_point_cloud("origin_y", np.array([[0., 0., 0.]]), enabled=True)
        ps_z = ps.register_point_cloud("origin_z", np.array([[0., 0., 0.]]), enabled=True)
        ps_x.add_vector_quantity("dir_x", np.array([[1., 0., 0.]]), enabled=True, color=(1., 0., 0.))
        ps_y.add_vector_quantity("dir_y", np.array([[0., 1., 0.]]), enabled=True, color=(0., 1., 0.))
        ps_z.add_vector_quantity("dir_z", np.array([[0., 0., 1.]]), enabled=True, color=(0., 0., 1.))

        #ps.look_at(cam_poses[0], np.array(dirs).squeeze()[0])

        ps.show()

BennetLeff commented 1 year ago

Thanks I'll give something like that a shot.

I've tried to play around with the transform matrix / R+t and I'm obviously doing something wrong because it stops rendering anything more than a blank screen. The COLMAP camera parser just reads in R|t from the ex/intrinsics - which gives me a blank screen. That makes sense because Polycam's coordinate convention is different - it's

Y axis points up, and -Z axis points in the direction that the camera was pointing So I tried flipping those axes but it still doesn't work. Oddly, I only just noticed I was doing something 'wrong'. What I accidentally originally copied was the blender importer which does
matrix = np.linalg.inv(np.array(frame["transform_matrix"]))
R = -np.transpose(matrix[:3,:3])
R[:,0]=-R[:,0]
T = matrix[:3, 3]

This however does converge / train. I don't understand why we might want to invert and transpose the matrix.

grgkopanas commented 1 year ago

You can ignore the transpose, its unnecessary, it actually gets nulled out by this transpose: https://github.com/graphdeco-inria/gaussian-splatting/blob/main/utils/graphics_utils.py#L40

It should not be there and it's an embarrassing mishap that we have such a messy code regarding our camera models.

BennetLeff commented 1 year ago

No worries, happy to add an extra pair of eyes.

Entongsu commented 4 months ago

Hi, @BennetLeff. I am trying to use the camera transformation from Polycom to render the GS, but I have found that it returns a blank image. The camera transformation matrix may have some problems. Would you know how to transform the camera matrix from Polycom to the GS rendering requirement? Thank you.

graphdeco-inria / gaussian-splatting

Using an alternative to COLMAP #7