zinsmatt / SpaceCarving

36 stars 4 forks source link

uvs #10

Open Hellsice opened 8 months ago

Hellsice commented 8 months ago

Thanks for sharing the code. I'm currently trying to apply if for an aquaponics system to determine the volume of plants. but i'm stuck on the actual space carving part.

  1. How do you decide if your object fits inside the voxel matrix?
  2. My uvs is way larger than my image sizes. values go all the way up to 40000. Is this what you use the division for, do adjust properly for these values?
  3. my projection matrix is also nearly the opposite of yours when it comes to positive and negative, causing the condition uvs[0,:]>=0 to be false in nearly all cases. Should i then filter for <=0? uvs[1,:]>=0 however gives all points. and neither rows have any values smaller than the respective image size and larger than 0. As for how i calibrated and got these projection matrices, i used the chessboard technique.
zinsmatt commented 8 months ago
  1. In this example, I define the voxel grid to enclose the object because I know approximately its location and size. If it's not your case, you can use a coarse-to-fine approach in which you start with a large grid (with large voxels) to get a first idea of the object location and size. And then, you use a smaller grid around the object with smaller voxels to get a finer reconstruction.

  2. uvs are the coordinates of the 3D points once projected in the image. The division is done because I'm using perspective projection (with the classical pinhole camera model). If the coordinates falls outside the image, it can mean that:

    • this 3D points is not visible in the image,
    • or the projection matrix is wrong.
  3. The projection matrix combines both the intrinsic and extrinsic camera parameters. The intrinsic parameters can indeed be obtained using the chessboard calibration technique. Then, you also need the extrinsic parameters, i.e the position and orientation of the camera in the world. For that, you can either:

    • obtain them externally, but this requires a specific setup in which you know exactly from where the images are taken,
    • or place some marker (like a chessboard) in the scene and compute the pose of each camera wrt. it,
    • or use more advanced techniques such as Sfm which will be able to compute the camera poses and a sparse reconstruction of the scene. (see COLMAP).
Hellsice commented 8 months ago
  1. I'm not sure how to make the grid coarser, as uvs is found through matrix multiplication, so any universal change to the voxel grid will be the same for uvs. Thus changing the size from 120 to 300, or to 100 does not seem to have any effect. So how do i then make the grid coarser?

2/3. i'm using a SIFT detector with a flann based matcher to find similar points between images and calculate the essential matrix from those. Which seems to be what you meant with the second point. I also compared your projection matrices with mine, and noticed that your first projection matrix isn't the equivalent of the camera matrix with a column of zeroes added to it. Which makes me think that i may have made a mistake with the camera poses. Though the first image should not have a rotation and translation matrix right? as it is the image that defined the coordinate systems?

zinsmatt commented 8 months ago
  1. The grid defines the 3D points that are projected in your images. It should cover the space where your target object is. 's = 120' defines the number of points along each direction. To change the size of the grid you need to scale the points by some factor (replace l. 59). Also, in this example, the grid is centered on (0, 0, 0), which might not be your case.

2/3. You can extract a relative pose from the essential matrix up to a scale. That means you can obtain relative pose for image pairs but you won't be able to merge them directly as they all have their own arbitrary scale. I would suggest you use a sfm tool like Colmap to compute your image poses.

Note: You cannot directly compare the projection matrices, as the camera poses are expressed in a certain reference coordinate system. For example, in this example the reference coordinate system is NOT the first camera pose. That is why its pose is not [I 0].

Hellsice commented 8 months ago

thanks for the tip. I tried using colmap, though i have noticed that exporting its results does not work perfectly, My volume calibration images work perfectly, but the other image sets give incorrect poses. I have also tried using your images and used colmap for the camera calibration and pose estimation, but i got no results from that either.

And as i'm trying to find the volume through a point cloud, i also tried using the point clouds from colmap, though that has a few noise points which makes volume estimation hugely incorrect, unless i can filter out said noise. Thus i'm sticking with what you made, though i can't get it to work at all. I uploaded what i have to my own repository, would you be willing to take a look?