microsoft / synthetic-rag-index

Service to import data from various sources and index it in AI Search. Increases data relevance and reduces final size by 90%+. Useful for RAG scenarios with LLM. Hosted in Azure with serverless architecture.
Apache License 2.0
23 stars 3 forks source link

Vectors are not sent to AI Search #75

Open Bergdoktor opened 2 months ago

Bergdoktor commented 2 months ago

Hi @clemlesne, first of all: thank you for your awesome work. finally got the call-center-ai sample project all set up but now I'm facing issues with filling the index via synthetic-rag-index.

I created the index "trainings" with the fields as specified. But when I import documents via the application pipeline the "Vector index size" remains at 0 Bytes. Document count is up to 112, "Total storage size" is only 289.7 KB.

For comparison I created another index and filled it with one of the same documents via the azure web frontend and this index has an document count of 93, vector index size of 575,66kb and 2.8MB of storage size.

Can you point me in the right direction if the 0 Bytes is as expected or if something went wrong with the embeddings from ada?

Thanks!

clemlesne commented 4 weeks ago

Indeed, vectors are not pushed at all into AI Search. My bad. Out of the box, you should be able to use BM25 from AI Search, which gives you good results even without vectors.

Thank you for noticing this!

We need to:

  1. Create vectors from the QA with OpenAI text embedding model
  2. Add vectors to the vectors of the JSON object
  3. Send the document to AI Search (already in place)