Noob questions :) - Githubissues

komrad36 / LATCH

Fastest CPU implementation of the LATCH 512-bit binary feature descriptor; fully scale- and rotation-invariant

MIT License

34 stars 12 forks source link

Thanks for the awesome code!

Sorry for my lack of maths here, but I'll try to explain anyway..

My goal here is to efficiently find the best matching image out of say 1000 other images who has the closest distance (group of descriptors), reading through the example code I'm wondering if you could clarify some things for me

    // ------------- LATCH ------------
    uint64_t* desc = new uint64_t[8 * keypoints.size()];
    std::vector<KeyPoint> kps;
    for (auto&& kp : keypoints) kps.emplace_back(kp.pt.x, kp.pt.y, kp.size, kp.angle * 3.14159265f / 180.0f);

What is happening here? you're adding those 4 columns to kps ? What's significance here is the kp.angle_pi_180?

    std::cout << "Warming up..." << std::endl;
    for (int i = 0; i < warmups; ++i) LATCH<multithread>(image.data, image.cols, image.rows, static_cast<int>(image.step), kps, desc);
    std::cout << "Testing..." << std::endl;
    high_resolution_clock::time_point start = high_resolution_clock::now();
for (int i = 0; i < runs; ++i) LATCH<multithread>(image.data, image.cols, image.rows, static_cast<int>(image.step), kps, desc);

So LATCH accepts image data and the vector table of keypoints, what is it doing here? is it simply searching image.data for those keypoints(kps) ? (As in, its searching for its own data set in its own data, but you could use some other value for kps and iterate over a set of kps to find the nearest match?)

    high_resolution_clock::time_point end = high_resolution_clock::now();
    // -----------------------

thanks!

Hi,

So for the demo I use OpenCV to do ORB detection to get some keypoints to describe, just to keep things simple. You can use any keypoints you like. ORB keypoints come back in a format different from the one expected by my LATCH, which takes KeyPoint structs (see lines 50-58 in LATCH.h). So the lines of code you posted are just creating a new vector of LATCH-suitable keypoints from the OpenCV keypoints.

The angle * pi / 180 is to convert from degrees to radians. For some silly reason, OpenCV uses degrees. My LATCH takes radians so I convert the incoming keypoint angles to radians on the way in.

As for the next section: LATCH takes the image and the vector of keypoints, yes. It then returns a vector, desc, containing 512-bit binary descriptors, one for each keypoint, describing the image at the keypoint, aware of scale and rotation. Note that if you don't want scale and rotation support you can use the ULATCH project. Note also that if you want even more performance you can use the CLATCH or UCLATCH projects, which are CUDA versions that do the same thing.

As for what you can actually do with those descriptors - the next step is generally to call LATCH on a second frame of a video, a second image for a reconstruction, etc. A second image with some of the same contents. Then you'd run matching. This process depends on your unique requirements but the common strategy for binary descriptors is to perform 2NN - "2 nearest neighbors" - with thresholding. This means that you don't actually search for the best match; you find the two closest matches, then consider the relevant keypoint from frame 1 to actually be a match with the relevant keypoint from frame 2 if and only if the distance (meaning Hamming distance for binary descriptors) to the best match is less than the distance to the second-best match by at least some threshold.

If you want to look into matching, I have projects for that too! Check out K2NN for the CPU version, and/or CUDAK2NN for the (much much faster) CUDA version. For matching especially, GPUs are well suited, and because it's O(n^2) in the number of descriptors, the CUDA version really shines, especially for large reconstructions or real-time work.

Let me know if I can do anything else to help!

komrad36 / LATCH

Noob questions :) #1