tantaraio / voy

🕸️🦀 A WASM vector similarity search written in Rust
https://www.npmjs.com/package/voy-search
Apache License 2.0
867 stars 31 forks source link

Is heavy usage possible? #46

Closed maxbaluev closed 1 year ago

maxbaluev commented 1 year ago

Hi all, I'm making a Chrome extension that saves all text content that a user looks at. Next I want to use voy to create embeddings and write an algorithm to make context for gpt. Can you tell me what difficulties I might encounter? Perhaps too much data, or will the speed be slow? Also, is it works fine in web/service workers?

DawChihLiou commented 1 year ago

Hey @maxbaluev thanks for reaching out! It's possible. Voy works with workers. Data storage will depend on the limitation of the storage you choose, for example local storage, and the transformer model you choose.

Currently Voy only support embedding index. It doesn't store the embeddings but instead use the embeddings as coordinates to locate data and enable faster retrievals, like database index conceptually.

I'm currently looking for sponsors to kickoff the project to provide native Wasm transformers in Voy to tackle the performance issue in most of the JavaScript based transformers. At the mean time, libraries like transformers.js or web-ai are great options for web to handling the text feature extraction.

maxbaluev commented 1 year ago

Is it true that indexes are only stored in memory, and the storage layer is managed by me ? Will it work with embeddings from openai?

DawChihLiou commented 1 year ago

That's correct. Voy doesn't store embeddings and you have full control over the storage solution. Voy also exposes serialize and deserialize functions so you can also store the serialized indexes in your storage layer.

Looking at OpenAI's documentation, the embeddings should work with Voy just fine. I think you're the first to experiment Voy with OpenAI. I'm very excited about what you're about to build!