kornelski / vpsearch

C library for finding nearest (most similar) element in a set
https://lib.rs/vpsearch
32 stars 6 forks source link

Parallelise Tree creation with Rayon #4

Open ntr-808 opened 3 years ago

ntr-808 commented 3 years ago

First of all, I have to say thank you for this library; I have found it extremely useful, ergonomic and it is an essential part of the project I am using it in.

As the size of the input data I am processing increases I have found the creation of the tree becomes a bottleneck as it is CPU bound and limited to only 1 core with the current implementation.

I have only skimmed the codebase and am generally unfamiliar with the creation of VP trees, but it appears a few choice applications of Rayon's parallel iterators (in the index sorting for example) would greatly reduce the creation time of the tree structure.

I'll be experimenting with it over the coming days but wanted to raise this issue to get your thoughts on this approach.

Thank you again.

kornelski commented 3 years ago

It could almost work. Creation of nodes is independent and parallelizable, except recently I've optimized them to use one contiguous block of memory.

You would have to revert:

https://github.com/kornelski/vpsearch/commit/b4f0a25001236237f663084d7ad8a4cc94b90e75

When each node is a Box, they can be allocated in parallel, and rayon::join could be enough to make it work.

Currently they're indices in a single vec, and that's a single-threaded dependency.