princeton-vl / DROID-SLAM

BSD 3-Clause "New" or "Revised" License
1.66k stars 273 forks source link

Question about the depth reading part #15

Closed xwjabc closed 2 years ago

xwjabc commented 2 years ago

Hi @zachteed, I have some question in the depth (disparity) reading part:

def build_frame_graph(self, poses, depths, intrinsics, f=16, max_flow=256):
    """ compute optical flow distance between all pairs of frames """
    def read_disp(fn):
        depth = self.__class__.depth_read(fn)[f//2::f, f//2::f]   # FIXME: Why down-sample? What does f mean?
        depth[depth < 0.01] = np.mean(depth)
        return 1.0 / depth

    poses = np.array(poses)
    intrinsics = np.array(intrinsics) / f
    ...

It seems a hyperparameter f is used to down-sample the disparity map / inverse depth map. This f is also used to adjust the intrinsics. I wonder what is the meaning of f here? Thanks!

zachteed commented 2 years ago

This function is "approximately" computing optical flow magnitude between all-pairs of images in each sequence to construct a co-visibility graph.

f is a downsampling factor. Meaning flow is being computed at 1/16 resolution and the intrinsics need to be adjusted accordingly. Originally, this function took to long to compute flow at full resolution. However, I now have a cuda implementation https://github.com/princeton-vl/DROID-SLAM/blob/6626965d86513d10f7b2e7278a4eb3f729bb9d10/src/droid.cpp#L120 which can operate quickly at full resolution. So I will likely soon replace the current implementation with the cuda version

xwjabc commented 2 years ago

Thank you for your quick response!