opencv / opencv

Open Source Computer Vision Library
https://opencv.org
Apache License 2.0
78.66k stars 55.78k forks source link

SIFT detector has bias of 0.25 pixels due to chosen interpolation scheme in scale pyramid #23123

Open vicsyl opened 1 year ago

vicsyl commented 1 year ago

System Information

System agnostic

Detailed description

The configurable yet default option is to upscale image first in the scale pyramid to double the original size. This is currently done by INTER_LINEAR_EXACT/INTER_LINEAR, which sample the image somewhat equidistantly given the area pixels cover as squares. In any case this means that the interpolated image along axis and flipped axis (imagine rotating or reflecting the image) is the same, but when downsampled again by nearest, the expected keypoint location is shifted by 0.5 when downsampling on the original axis (taking even pixels only) and the flipped axis (even pixels there = odd pixels on the original axis). The image downscaled again back to the original size is not the same on the flipped axis (i.e. this up/downscaling is not rotation/reflection equivariant) and has the bias of 1/4 pixels for rotations and (1-s)/4 pixels for downscaling (where s is the scale). The solution is to use a different interpolation scheme in upscaling in the scale pyramid, that simply map even pixel indices at 2x to pixels at x in the original image and interpolate between them for the odd indices. For indices at 2d-1 (d is the original size of the image along a given axis) let's replicate (would map to 2d - 2 in upscaled / to d-1 in the original image). This way the operation of upscale/downscale by 'nearest' would result in exactly the original input image and thus would become rotation / reflection equivariant. See the minimal example:

original axis: (0, 1, 2, 3) would become (0, 0.5, 1, 1.5, 2, 2.5, 3, 3) downscaled again by 'nearest': (0, 1, 2, 3)

flipped axis: (3, 2, 1, 0) would become (3, 2.5, 2, 1.5, 1, 0.5, 0, 0) downscaled again by 'nearest': (3, 2, 1, 0)

See this repo for notebooks showing the issue for OpenCV and Kornia: https://github.com/vicsyl/dog_precision

Most importantly https://github.com/vicsyl/dog_precision/blob/master/Accuracy%20of%20homography%20estimation.ipynb

Steps to reproduce

See https://github.com/vicsyl/dog_precision

Issue submission checklist

ducha-aiki commented 1 year ago

Some visual explanation of the problem: image

RohanHBTU commented 1 year ago

@vpisarevI would like to contribute to this issue. I have already built opencv in my local system. Can you please share some resources regarding this issue so that I can work upon this?

RohanHBTU commented 1 year ago

And also please tell me what should be my base branch for this issue?

vicsyl commented 1 year ago

@RohanHBTU this has already been addressed in https://github.com/opencv/opencv/pull/23124

vicsyl commented 1 year ago

Addressed by https://github.com/opencv/opencv/pull/23124

The code in double_image in test_descriptors_regression.impl.hpp:

https://github.com/opencv/opencv/pull/23124/files#diff-c0c31cbda83b85a0c4a49a75c5cac6da2b09a0aba3fc464e1b043023b1169a9eR10

is copied to mirror the logic in sift.dispatch.cpp (function createInitialImage), so that it is shown that upscaling the image like that and downscaling again by nearest results in the same input image. As it wasn't straightforward where to export the function from within the production code the logic was copied like that for now.