lpiccinelli-eth / UniDepth

Universal Monocular Metric Depth Estimation
Other
553 stars 44 forks source link

Details about prediction on KITTI dataset #65

Open gxytcrc opened 1 month ago

gxytcrc commented 1 month ago

Hello! Thank you for your excellent work. I have a question regarding the KITTI prediction. Should the input image be resized to [420, 560] for evaluating v2? I've already tried resizing, but the predicted intrinsics deviate significantly from the intrinsic provided by KITTI. Is this normal? Additionally, what other preprocessing steps should be performed before feeding the image into the model?

lpiccinelli-eth commented 1 month ago

The predicted intrinsics corresponds to the input image size. The infer method takes care of rescaling the predicted intrinsics to match the original input shape. If you are using your own forward method, you have to take care of it yourself.

V2 does not require a fixed image shape, the infer method takes care of that too, by using the input image ratio and just resizing the image to fit the max dimension seen during training.

KITTI shapes are outside the training domain (really elongated), so the results may be a bit off, since it would be better to pad it and have it inside the image ratios seen during training.