How to transform raw labeled data to training data?

Hi, thanks for your great work! I am from your subsequent work VPT and have visited issues about how to estimate scale and shift to transform model prediction relative depth to metric depth via zero-shot procedure.

I just wonder how to transform raw labeled data to training data. More precisely, when training the DPT model from scratch, how to transform the raw labeled metric depth to the relative depth which DPT aimed to predict?

For example, I have an image $img$ with $M$ pixels and corresponding metric depth . $pix_i$ 's depth is $depth_i$ (meters), which is a absolute depth. And how can I transform $depth_i$ to relative depth which the DPT model is actually aimed to predict? And how can I get the corresponding scale and shift？ For training data， do scale and shift change in different images？

Would you kindly give me some hints? Thanks.

isl-org / MiDaS

How to transform raw labeled data to training data? #232