unum-cloud / usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
https://unum-cloud.github.io/usearch/
Apache License 2.0
2.15k stars 130 forks source link

Fix: Exact search accuracy and speed #276

Closed ashvardanian closed 1 year ago

ashvardanian commented 1 year ago

I've fixed the problem spotted by @ebursztein and added an exact search benchmark for small-scale exact search, standard in LLM apps.

$ python python/scripts/bench_exact.py --ndim 256 --n 100_000 --q 10 --k 100
Hardware acceleration in USearch:  auto
USearch:  0.013957023620605469
FAISS:    0.04720497131347656

$ python python/scripts/bench_exact.py --ndim 256 --n 100_000 --q 10 --k 100 --half
Hardware acceleration in USearch:  auto
USearch:  0.014386892318725586
FAISS:    0.08691692352294922

Even without hardware acceleration from SimSIMD, on the M2 Mac Book Pro:


The script is located at python/scripts/bench_exact.py and has a few CLI parameters:

For large q FAISS works quite well - in such cases it redirects the query to the linked BLAS implementation. But it means it supports only two metrics - L2 and inner product. With USearch exact search you can still use all the same metrics as with usearch.index.Index class and provide custom JIT-ed CompiledMetric.

ashvardanian commented 1 year ago

:tada: This PR is included in version 2.6.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: