unum-cloud / usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
https://unum-cloud.github.io/usearch/
Apache License 2.0
1.91k stars 108 forks source link

Apache Arrow Support #313

Open thatcort opened 7 months ago

thatcort commented 7 months ago

Describe what you are looking for

It would be great to be able to use search results in other Apache Arrow-based tools with zero-copy overhead.

Can you contribute to the implementation?

Is your feature request specific to a certain interface?

It applies to everything

Contact Details

brian@briancort.com

Is there an existing issue for this?

Code of Conduct

ashvardanian commented 7 months ago

@thatcort to clarify, you want to be able to export Matches and BatchMatches in to Arrow tables, or allow indexing or brute-force search over the input tables? The latter seems to make more sense, as in other cases the overhead of copying is very negligible and the implementation is surprisingly complex. Can you provide an example of a use-case please? 🤗

thatcort commented 7 months ago

If the implementation is complex, then probably don't worry about it. I'm imagining use cases like taking the top n distance matches and then doing bulk operations on that group of vectors. For example, subtracting a constant vector from each, taking their norms, etc.