unum-cloud / usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
https://unum-cloud.github.io/usearch/
Apache License 2.0
2.27k stars 142 forks source link

Bug: Distance is negative #258

Closed alexbarev closed 1 year ago

alexbarev commented 1 year ago

Describe the bug

Shouldn't distances for all metrics be non-negative? When I used the Inner Product as a distance measure, I found that it could produce negative values when using Index.search().

The same behavior is observed in the C# binding, so I suspect the same will occur when using the C99 interface.

Steps to reproduce

import numpy as np
from usearch.index import Index, Matches

index = Index(
    ndim=3, 
    metric='ip', 
    dtype='f32', 
)

keys = np.arange(5)
vectors = np.array(
    [
        [1, 1, 1],
        [-1, -1, -1],
        [1, 2, 3],
        [-1, -2, -3],
        [1, -2, -3],
    ], 
    dtype=np.float32
)
search_vector = np.array([1, 1, 1], dtype=np.float32)

index.add(keys, vectors)
matches: Matches = index.search(search_vector, 4)

for match in matches:
    assert(match.distance > -1e-6), f"Negative distance {match.distance}"

Expected behavior

Distance is always >= -epsilon, where epsilon is very small.

USearch version

v2.3.1

Operating System

Ubuntu 22.04

Hardware architecture

x86

Which interface are you using?

Python bindings

Contact Details

No response

Is there an existing issue for this?

Code of Conduct

ashvardanian commented 1 year ago

It might be a bit counter-intuitive, but your vectors aren’t normalized in any way, so you shouldn’t expect the inner products to be in the 0-1 range.