Open kennon opened 3 weeks ago
An example index info for what we're building:
usearch.Index(ScalarKind.BF16 x 768, MetricKind.IP, multi: False, connectivity: 16, expansion: 128 & 64, 6,738,822 vectors in 5 levels, haswell hardware acceleration)
This one took ~10s to load, another one with ~20m vectors took 45 minutes.
@kennon, interesting, looking into it!
@ashvardanian awesome, thanks! I don’t want to post a public url but if you drop me an email I can send you a link to the index files we are trying to load. Let me know if there is any more information I can provide, thanks!
Describe the bug
With larger usearch index sizes, restore times become impractically long. For a range of different index sizes, restore durations range from ~10s for an 11GB / 6m embedding index up to 45m (!) for a 32GB / 20m embedding index. This happens with memory mapping on or off (i.e.
view=True
orview=False
). During the entire load time, 1 cpu core is pegged out at 100%. After being loaded, index appears to behave normally.We are running this on an ec2 instance with 64GB of ram, so the entire index should fit very comfortably in memory even with memory_map turned off. The index files are being loaded from ephemeral SSDs attached to the ec2 instance, so disk read time should not be a major factor.
We are running this inside of docker (ECS), however we have not experienced similar file load issues with other software (we use a variety of python and non-python libraries that involve loading large files from this same storage, regularly >= 100GB) so it seems unlikely to be something at the OS/docker level 🤷 (the ECS task has access to the full amount of memory)
Steps to reproduce
The index was built with usearch using all defaults, then saved to disk via
index.save(index_path)
. Once loaded, index functions normally.Expected behavior
We would expect a somewhat linear-ish relationship between index size / embedding count and load times.
Thank you for such an awesome project, we have fallen in love with usearch and hope we can figure this one out, which is currently blocking us from using it!
USearch version
v2.16.0
Operating System
Ubuntu 22.04 (dockerized ECS)
Hardware architecture
x86
Which interface are you using?
Python bindings
Contact Details
kballou@eezy.com
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct