unum-cloud / usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
https://unum-cloud.github.io/usearch/
Apache License 2.0
2.15k stars 130 forks source link

Reducing Memory Consumption #238

Closed ashvardanian closed 1 year ago

ashvardanian commented 1 year ago

In the past, USearch used Tessil's Robin Map. Sadly, we needed Multi-Set functionality, and that class didn't support it. So we have temporarily switched to std::unordered_multiset, which is horrifyingly slow and uses a lot of memory.

For reference, during one of my experiments, I failed to fit over 920 Million entries into 512 GB of RAM. With this patch, I could do over 1 Billion without changing the vectors or any other logic. So, the inefficiency in a single container design was equivalent to ~ 100 Million data entries.

ashvardanian commented 1 year ago

:tada: This PR is included in version 2.4.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: