unum-cloud / usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
https://unum-cloud.github.io/usearch/
Apache License 2.0
2.15k stars 130 forks source link

Bug: Performance of insertion into the index #312

Closed spicymatt closed 10 months ago

spicymatt commented 10 months ago

Describe the bug

When comparing the performance of uSearch and Faiss, I am a bit puzzled by the insertion time, that is clearly very defavorable. Am I doing something wrong ? Context: Swift code. Num of vectors: 35,000 Dim: 200

Are these differences in insertion something that is expected ?

The swift code does this:

let testSize:UInt32 = 35_000 var fisherVectors:[[Float]] = read() // reads lots of vectors of size 200

let index = USearchIndex.make(metric: USearchMetric.cos, dimensions: vectorLength, connectivity: 8, quantization: .F32)

index.reserve(testSize)

// Adding a slice var indexC:UInt64 = 0 fisherVectors.forEach { vector in index.add(key:indexC, vector:vector) indexC += 1 }

Steps to reproduce

Build an index in Swift using the code above

Expected behavior

The timing difference with FAISS is unexpected, especially when using FAISS without the batch addition. I was really expecting the most defavorable FAISS situation to be of the same order as USearch.

USearch version

v2.8.12

Operating System

macOS Sonoma

Hardware architecture

Arm

Which interface are you using?

Other bindings

Contact Details

mk@cyme.io

Is there an existing issue for this?

Code of Conduct

ashvardanian commented 10 months ago

Hey, @spicymatt! It seems like you are adding from a single thread. Is that so?

spicymatt commented 10 months ago

Hello

Yes I am….I wasn’t aware that I could call the add method from multiple threads. Are these call thread safe from the Swift binding ?

Matthieu

On 14 Nov 2023, at 16:29, Ash Vardanian @.***> wrote:

Hey, @spicymatt https://github.com/spicymatt! It seems like you are adding from a single thread. Is that so?

— Reply to this email directly, view it on GitHub https://github.com/unum-cloud/usearch/issues/312#issuecomment-1810458330, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIB6RTYN27RGKUSGLF7WMWLYEOE6HAVCNFSM6AAAAAA7KVQUO2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJQGQ2TQMZTGA. You are receiving this because you were mentioned.

ashvardanian commented 10 months ago

Yes, the should be 🙂

By now, the core functionality is supported across all bindings. Broader functionality is ported per request. In some cases, like Batch operations, feature parity is meaningless, as the host language has full multi-threading capabilities and the USearch index structure is concurrent by design, so the users can implement batching/scheduling/load-balancing in the most optimal way for their applications.

That's from the root README.md. @spicymatt it would be great if you could add a multi-threaded example to swift/README.md if it works out 🤗

spicymatt commented 10 months ago

hello. I will. I made some tests using concurrent inserts and I get the 10 cores of my M1 Mac to work at full speed. Reducing the time to insert 35,000 images to 5s or so. I read a previous post where you explained that the L2Index of Faiss is indeed not really an index and where brute force search is used. So it makes more sense to me that Faiss is so much faster in indexing in my use case.

When I run the search benchmark on those 35k images using both Faiss and USearch, my preliminary findings show that USearch is much faster than Faiss. Orders of magnitude faster.

I will post the results and will submit a PR to the Readme for Swift.

Congrats on your great work.

Matt PS/ it would be great to have a better explanation on the connectivity parameter...what is a good value when speed matters. Is there anything in the doc about it ?

ashvardanian commented 10 months ago

Congrats on your great work.

Thanks @spicymatt, it means a lot!

I read a previous post where you explained that the L2Index of Faiss is indeed not really an index and where brute force search is used.

Indeed. And if you need an exhaustive brute-force search, it's already implemented in most bindings, but not in Swift. I can easily add it if you find it relevant. Would that be of help?

it would be great to have a better explanation on the connectivity parameter...what is a good value when speed matters. Is there anything in the doc about it ?

Currently, our documentation on this topic is very minimal, and I believe nobody has published any interesting comparisons on that topic. Good case for the next benchmarks!

ashvardanian commented 10 months ago

@spicymatt, would it make sense to provide helper functions for Index class in the Swift layer using GCD or some other abstraction? Would you be able to integrate something like that?