tum-vision / tandem

[CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo
917 stars 148 forks source link

About the rendered depth of the last keyframe. #3

Closed chenerg closed 2 years ago

chenerg commented 2 years ago

Very appreciate your great work! I have some confusion about dense tracking with the rendered depth of the last keyframe.

During the dense tracking, a new frame is tracked against the last keyframe n with the rendered depth of n.

However, when the last keyframe is tracked, you already predicted a dense depth map of the last keyframe with CVA-MVSNet. And the predicted depth of the last keyframe is used for constructing the TSDF volume. Therefore, is there so much difference between the predicted depth or the rendered depth? I'm curious about how much the difference is.

nynyg commented 2 years ago

Hi @chenerg, thanks for your interest in our work.

Since the rendered depth is from the global TSDF grids, it could deliver more globally consistent depth which reduces the tracking drift. Table 1 of our paper shows our evaluation on the EuRoC dataset. The column of "DSO+Dense Depth" shows the results of directly using the dense depth map predicted by CVA-MVSNet for tracking. As you can see there, "Ours" which uses the rendered depth achieved better results.

chenerg commented 2 years ago

Thanks! That's what I missed.