Fix: Exact search accuracy and speed

I've fixed the problem spotted by @ebursztein and added an exact search benchmark for small-scale exact search, standard in LLM apps.

$ python python/scripts/bench_exact.py --ndim 256 --n 100_000 --q 10 --k 100
Hardware acceleration in USearch:  auto
USearch:  0.013957023620605469
FAISS:    0.04720497131347656

$ python python/scripts/bench_exact.py --ndim 256 --n 100_000 --q 10 --k 100 --half
Hardware acceleration in USearch:  auto
USearch:  0.014386892318725586
FAISS:    0.08691692352294922

Even without hardware acceleration from SimSIMD, on the M2 Mac Book Pro:

USearch performs 3.3x faster than FAISS on single-precision vectors.
USearch performs 6.1x faster than FAISS on single-precision vectors.

The script is located at python/scripts/bench_exact.py and has a few CLI parameters:

--ndim - number of vector dimensions.
--n - number of vectors in the dataset to search.
--q - number of vectors to query among n.
--k - number of closest neighbors to retrieve per query.
--half - use half-precision.

For large q FAISS works quite well - in such cases it redirects the query to the linked BLAS implementation. But it means it supports only two metrics - L2 and inner product. With USearch exact search you can still use all the same metrics as with usearch.index.Index class and provide custom JIT-ed CompiledMetric.

unum-cloud / usearch

Fix: Exact search accuracy and speed #276