spotify / voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.
https://spotify.github.io/voyager/
Apache License 2.0
1.26k stars 51 forks source link

Computation on GPU? #37

Closed RicardoHS closed 5 months ago

RicardoHS commented 9 months ago

Hi 👋

Thank you for open-source a tool like this (again)

Reviewing a little bit the documentation it's stated

Tuned for lighting-fast production use at Spotify, Voyager provides near-instantaneous nearest-neighbor lookups on in-memory collections of embeddings — without requiring GPUs — so you can power millions of requests per day at millisecond latencies.

It seems like you already took in mind the GPU computation for the library.

What were the reasons to do not include GPU support? Would you be open to discuss a possible functionality addition to support it?

psobot commented 7 months ago

Hi @RicardoHS!

What were the reasons to do not include GPU support?

The simple and blunt answer is that we wanted this project to be installable as easily as possible. The Windows and macOS binaries for Voyager could fit on a floppy disk. Voyager currently has nearly zero dependencies (only numpy in Python, and nothing in Java).

A common meme is that just installing CUDA drivers can take tens of gigabytes and hours of effort. Of course, adding GPU support has performance advantages; but in my (current) view, those advantages are not worth the huge amount of overhead that it would take (including compatibility issues across different hardware vendors, CUDA vs. ROCM vs. Metal support, etc).

Without GPU support, Voyager fills the gap between Annoy and much more feature-packed and performant vector search packages.

Would you be open to discuss a possible functionality addition to support it?

For sure! I think a voyager[gpu] package would make sense; but creating that is very well outside of the scope of our current maintenance budget for this project.