I have one question about why you did not use depth anything's infer function directly?

The self.model will output relative depth in that snippet as the depth_anything_vits14.pth checkpoint of DepthAnything is not a metric mono depth estimator. So the authors have used the metric depth from ZoeDepth to rescale the relative depths predicted by DepthAnything as to try to make it metric.

DepthAnything also has fine-tuned metric checkpoints here in case you want to try some of those out and see if it works better. I wouldn't know what the currently best way to get metric mono depth for videos is? From what I know, it is still an open research problem.

henry123-boy / SpaTracker

I have one question about why you did not use depth anything's infer function directly? #29