spotify / voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.
https://spotify.github.io/voyager/
Apache License 2.0
1.26k stars 51 forks source link

Stream-based I/O examples/documentation #39

Closed ccurro closed 7 months ago

ccurro commented 9 months ago

Great looking project!

The blog post [1] mentions "Google Cloud Platform–compatible stream-based I/O (stream indices from Google Cloud Services!)" as a feature.

As far as I can tell, the only mention in the Python docs of streaming I/O is the use of file-like objects for the save and load methods on Index [2,3]. Could we have specific GCP guidance? Is the recommendation to work with streaming upload/download objects from GCS? [4] From reading the GCP docs, I don't get the impression that that enables what I think the blog post suggests.

[1] https://engineering.atspotify.com/2023/10/introducing-voyager-spotifys-new-nearest-neighbor-search-library/ [2] https://spotify.github.io/voyager/python/reference.html#voyager.Index.save [3] https://spotify.github.io/voyager/python/reference.html#voyager.Index.load [4] https://cloud.google.com/storage/docs/streaming-downloads#storage-stream-download-object-python

psobot commented 7 months ago

Hi @ccurro!

As far as I can tell, the only mention in the Python docs of streaming I/O is the use of file-like objects for the save and load methods on Index [2,3]. Could we have specific GCP guidance? Is the recommendation to work with streaming upload/download objects from GCS? [4] From reading the GCP docs, I don't get the impression that that enables what I think the blog post suggests.

Yes, that's exactly right - Voyager supports streaming via a file-like object. Google's Python libraries for Google Cloud Storage support returning file-like objects (see google.cloud.storage.blob.Blob#open) which can then be passed directly into Voyager for streaming reads/writes.

The Java bindings also support the same streaming behaviour, using Java-like input- and output-streams; see com.google.cloud.storage.Blob#reader for the corresponding Google API.

ccurro commented 7 months ago

Perfect - thank you!