unum-cloud / usearch

Fast Open-Source Search & Clustering engine Ɨ for Vectors & šŸ”œ Strings Ɨ in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram šŸ”
https://unum-cloud.github.io/usearch/
Apache License 2.0
2.27k stars 143 forks source link

fix stateful metric dimensions #532

Closed terencezl closed 4 days ago

terencezl commented 5 days ago

Stateful metric is not getting dimensions set correctly. Therefore vector storage is not able to get the information for allocations. For example, I have a stateful class called ScalarQuantizerUtil, which is a thin wrapper around ScalarQuantizer from Faiss. I'm getting segfaults if I use it this way.

#include <usearch/index.hpp>
#include <fmt/format.h>
#include <usearch/index_dense.hpp>
#include <faiss/impl/ScalarQuantizer.h>

using namespace unum::usearch;

inline float cosine_metric_wrapper(uint8_t *a, uint8_t *b, const faiss::ScalarQuantizerUtil *squ_ptr)
{
    auto &squ = *squ_ptr;
    return 1 - squ.dc->compute_code_distance(a, b);
}

int main(int argc, char **argv) {
    faiss::ScalarQuantizerUtil squ(512, faiss::ScalarQuantizer::QT_4bit, "dynamic_ranges.bin");

    metric_punned_t metric = metric_punned_t::stateful(
        reinterpret_cast<std::uintptr_t>(&cosine_metric_wrapper),  // Wrapper function pointer
        reinterpret_cast<std::uintptr_t>(&squ),        // Pointer to metric instance
        metric_kind_t::unknown_k,  // Using unknown since this is custom
        scalar_kind_t::i8_k       // We're using int8_t vectors
    );

    index_config_t index_config;
    index_config.connectivity = 32;
    index_config.connectivity_base = 32 * 2;

    index_dense_t index = index_dense_t::make(metric, index_config);
    fmt::print("Index created\n");

    std::vector<int8_t> vec(256, 0);

    index.reserve(10); // Pre-allocate memory for 10 vectors
    index.add(42, vec.data()); // Pass a key and a vector
    fmt::print("Added vector\n");

    auto results = index.search(vec.data(), 5); // Pass a query and limit number of results
    fmt::print("Found {} matches\n", results.size());

    for (std::size_t i = 0; i != results.size(); ++i)
        std::printf("Found matching key: %zu\n", results[i].member.key);
    return 0;
}

The fix is to follow suit with the stateless init pattern, setting dimensions correctly.

ashvardanian commented 4 days ago

Thanks! Merged into main-dev šŸ¤—