pablovela5620 / monoprior

32 stars 0 forks source link

Some questions about relative depth to pcd? #1

Open shuiyued opened 4 months ago

shuiyued commented 4 months ago

Hello, this library is very nice. However, I would like to know how you reconstruct the 3D point cloud from relative depth, like in DepthAnything, without distortion.

pablovela5620 commented 4 months ago

I basically have to convert the disparity to depth making a few assumptions, so I have to guess at a focal length https://github.com/pablovela5620/monoprior/blob/2c6c8e026a1fa6659cbe1b9ea8ad07ebf08abdfe/monopriors/depth_utils.py#L6

and baseline and then use https://github.com/pablovela5620/monoprior/blob/2c6c8e026a1fa6659cbe1b9ea8ad07ebf08abdfe/monopriors/depth_utils.py#L22

Once I have convert disparity to depth I use this function to convert into a 3d point cloud https://github.com/pablovela5620/monoprior/blob/2c6c8e026a1fa6659cbe1b9ea8ad07ebf08abdfe/monopriors/depth_utils.py#L62

shuiyued commented 4 months ago

Thanks for your reply! I tested some images and found that a field of view (FOV) of 55 degrees is almost effective.

pablovela5620 commented 4 months ago

Yea its definintely not foolproof, there will be image where the assumed 55 FOV will not work. You can try something like dust3r to attempt to estimate the camera intrinsic but having accurate camera intrinsic will give you a much better point cloud. There's also the fact that the relative depth models will still produce bad depth maps for certain images

shuiyued commented 3 months ago

Yes, but is the dust3r adaptive for multiple images? The depthanything and metric3d are depth models designed for single images. To be honest, I think there will be little improvement in relative depth for depthanythingv2 from a single image.

shuiyued commented 3 months ago

I misunderstood your meaning. The dust3r attempts to estimate the camera intrinsics, which will result in a better point cloud. I predict that using a single image to estimate camera intrinsics for a metric camera will be unreliable.

pablovela5620 commented 3 months ago

right, those models don't take into account the camera intrinsic, where as something like Unidepth or Dust3r estimate the camera intrinsic leading to better results

shuiyued commented 2 months ago

In fact, there is another problem: the depth predicted by the depthanything model is scale-shift invariant. This means that the intrinsic depth cannot recover the distortions, only the scale, similar to the concept of blow equalization. X=(x-u)d/f Y=(y-v)d/f Z= d metric d = scale * ssi depth + shift