pippy360 / transformationInvariantImageSearch

A reverse image search algorithm which performs 2D affine transformation-invariant partial image-matching in sublinear time
MIT License
288 stars 37 forks source link

Weighted Triangles for Reducing Search Space #4

Open glop102 opened 6 years ago

glop102 commented 6 years ago

I read through the writup you have on github.io and saw that you rotate the triangle onto every edge to make a hash of it. I was thinking that it seems fairly wasteful to have essentially the same triangle multiple times in a database to compare against.

Idea: choose a single rotation for the triangle based on some heuristic

I was originally thinking about choosing the lightest corner of the triangle and always rotate it to the top. As long as there is no crazy color/saturation shifting, i believe this would help reduce the number of hashes that would need to be compared.

pippy360 commented 6 years ago

Yeah hashing each triangle 3 times is obviously very wasteful.
I considered using a heuristic but I couldn't think of a heuristic that would be correct 100% of the time and I didn't want to sacrifice any accuracy. In practice it still works well and the 3x space used isn't much of an issue. The much bigger issue is that the number of triangles grows n^3 (where is "n" is the number of keypoints).

One thing I could do is just rotate the triangle to only 2 edges instead of 3 (where the 2 edges are randomly chosen) then we can be sure that two matching triangles will have at least one edge in common.

glop102 commented 6 years ago

That is not a bad idea also.

Reducing the number of triangles generated per number of keypoints is something that i will have to think about more. The question becomes what specific cases of cropping do you want to not support. When i think of a good idea, i will try my hand at implementing it as a runtime option. A couple really rough ideas that popped into my head:

With those thoughts i see two main categories that need to be addressed: big and small triangles. Big to help find when the image gets shrunk. Small to help find sub-parts of an image like when things get sampled into other images. Then there is the thought of a REALLY big picture, perhaps a medium level of triangles sizes is prudent. Then you enter multiple scale feature detection, which is basically back to just do all the triangles. What a circle to think about.....

pippy360 commented 6 years ago

draw line to each points nearest neighbor, then make triangles without crossing any of those lines (idk how well this would work)

The problem here is that a nearest neighbor might not be the nearest neighbor after affine transformations are applied. In fact if you take any 3 keypoint you can always make any two of those keypoint closer to each other than the other keypoint by applying affine transformations. This makes it really hard to come up with an affine transformation invariant way of describing two keypoints as close to each other.

DonaldTsang commented 5 years ago

There is a solution to the rotation - determine the lightest and darkest corner of the triangle (after applying blur), and make sure the hash is triangle ABC (corner A is lightest, C is darkest) and not BCA or CBA etc. If two of the corners are similar in brightness we double the amount of triangles to scan. Same goes for homogeneous triangles.

Another solution to the "exploding triangle problem" is to make sure that two over-lapping triangles should not have more than a certain % of overlap. This way no matter how the triangles are scaled it would still work (affine transformations).

One more solution to do this, is to essentially scale the points such that the "nearest neighbors" maintains the same geometry even if affine transformation is applied, such that divisions of those points can be used to form triangles with less redundancy.

image

Th only way this system breaks down is if we introduce projective transformations, otherwise all of the previously mentioned methods will all work.

https://www.cc.gatech.edu/~vempala/papers/isopca.pdf

DonaldTsang commented 5 years ago
DonaldTsang commented 5 years ago

Reading through https://news.ycombinator.com/item?id=14973741 Some recommended https://hal.inria.fr/inria-00548539/document

Also here are two of my hypothesis in matching images @pippy360 :

  1. Given two convex quadrilateral ABCD and WXYZ, as long as the triangular pairs ABC and WXY, BCD and XYZ, ACD and WYZ, AND ABD and WXZ match in perceptual hash, the quadrilaterals should at least be visually similar (affine transformation-wise). From this we can extrapolate that polygons with more sides can have the same property (e.g. 10 triangles for pentagons, 20 triangles for hexagons).
  2. Given a point X that is in triangle ABC's boundaries, no matter much many affine transformation has been done, point Xnew will always be in the transformed triangle, where ABX, BCX, ACX will be transformed similarly to ABC. From this we can extrapolate that smaller triangles can be avoided in favor of bigger triangles.

In order to use these hypothesis properly, every image with hashes must also have the coordinates of the triangles in order to abstract away their position in relations to other triangles. Also on a related note take a look at https://en.wikipedia.org/wiki/Convex_hull_algorithms

DonaldTsang commented 5 years ago

Regarding trianglular Discrete Cosine Transforms https://arxiv.org/pdf/1811.04033.pdf

DonaldTsang commented 5 years ago

What about triangular Wavelet Transforms?