henry123-boy / SpaTracker

[CVPR 2024 Highlight] Official PyTorch implementation of SpatialTracker: Tracking Any 2D Pixels in 3D Space
Other
725 stars 25 forks source link

I have one question about why you did not use depth anything's infer function directly? #29

Open WuJuli opened 3 months ago

WuJuli commented 3 months ago

Why using the 3d part from zoedepth_nk? ksnip_20240730-170925

m43 commented 3 months ago

The self.model will output relative depth in that snippet as the depth_anything_vits14.pth checkpoint of DepthAnything is not a metric mono depth estimator. So the authors have used the metric depth from ZoeDepth to rescale the relative depths predicted by DepthAnything as to try to make it metric.

DepthAnything also has fine-tuned metric checkpoints here in case you want to try some of those out and see if it works better. I wouldn't know what the currently best way to get metric mono depth for videos is? From what I know, it is still an open research problem.