montefiore-institute / midair-dataset

Example and utiliy scripts for the Mid-Air dataset
MIT License
17 stars 6 forks source link

How can I project points in world coordinates from depth maps? #8

Closed ZachL1 closed 1 year ago

ZachL1 commented 1 year ago

Thank you for creating and releasing this dataset to the public, great work!

I'm confused about projecting the pixel points to the world coordinate system by depth, for the sake of simplicity, I will project the points to the camera (drone) coordinate system first. In general, if depth is encoded in a standard way (rather than the "euclidean distance of the objects to the focal point" in MidAir), then going from uv to camera coordinates can be achieved simply:

u = range(0, rgb.shape[1])
v = range(0, rgb.shape[0])

u, v = np.meshgrid(u, v)
u = u.astype(float)
v = v.astype(float)

Z = depth.astype(float)
X = (u - K[0, 2]) * Z / K[0, 0]
Y = (v - K[1, 2]) * Z / K[1, 1]

But what should I do in MidAir? Following the instructions in the FAQ on the website, I implemented it like this, but it doesn't work.

# u,v is obtained as above
np.clip(depth, 1, 1250, depth) # for better visualization

f = 512 # since f=h/2=w/2=512, I simply hardcode
z = depth.astype(float)
r = z / np.sqrt((u-f)**2 + (v-f)**2 + f**2)
X = r * (u-f)
Y = r * (v-f)
Z = r * f
X, Y, Z = Z, Y, X

The projected points look like this, which doesn't look right: image If I don't convert the coordinate system to drone body frame $b$

# X, Y, Z = Z, Y, X

Seems a little better, but still weird and seems incomplete: image

Am I missing something? Or is there something wrong with my implementation? You have detailed the projection to the world coordinate system in the FAQ, can you provide a verified implementation of this? I think it would be very helpful, thanks a lot!

ZachL1 commented 1 year ago

I think I may have figured it out.

If I don't convert the coordinate system to drone body frame $b$ (the second image above), I should get the correct result. The reason it looks incomplete in the above picture is that the stone on the right is too close (below is its corresponding rgb), so the area in the point cloud set that is very close to the origin, it is actually correct and complete. image So why does the FAQ on the site say that we need to use this transform to convert to drone body frame? It may seem like a redundant operation (perhaps it's my ignorance leading me to the wrong conclusion, let me know if so).

Converting these coordinates to the drone body frame $b$ simply be obtained by the following transformation: $$(x_b,y_b,z_b)=(z_c,y_c,x_c)$$

In addition, I further converted the point cloud from the camera coordinate system to the world coordinate system. The result is the same as the two screenshots above, and the result after converting to drone body frame is very strange.

I tried another example Kite_training/sunset/color_left/trajectory_1001/000000.JPEG, here is the original image: image and here are the results with(up) and without(down) the conversion to drone body frame: image image

ZachL1 commented 1 year ago

Sorry, I found my stupid mistake, I used valid_mask = Z > 0 when generating point cloud, and after X,Y swap. This results in many pixels being ignored.

Thanks again for your great work.