ruoccofabrizio / azure-open-ai-embeddings-qna

A simple web application for a OpenAI-enabled document search. This repo uses Azure OpenAI Service for creating embeddings vectors from documents. For answering the question of a user, it retrieves the most relevant document and then uses GPT-3, GPT-3.5 or GPT-4 to extract the matching answer for the question.
https://azure.microsoft.com/en-us/products/cognitive-services/openai-service
MIT License
846 stars 510 forks source link

embeddings persistence #83

Open sbradford006 opened 1 year ago

sbradford006 commented 1 year ago

I'm attempting to have the embeddings in the redis (api) container persist a restart.

having mounted /data to a dir on localhost I only ever see two dirs (/data/redis and /data/redisinsight). Neither of these seem to contain any data...

i've played around with adding --save config in docker-compose but i am no docker wizard and it looks like any config passed at compose time nukes the default config configured in the container.

very possible i am misunderstanding how this should all hang together... but any advice would be welcome!

itmilos commented 1 year ago

@sbradford006 you should have redis dump in storage service

sbradford006 commented 1 year ago

Thanks for your input itmilos. Do you mean the storage service for the documents?

Currently documents are stored in Azure blob storage but there is a local (containerizes) redis for embeddings. I don't see anything other than explicitly uploaded documents in the Azure storage unfortunately.

Have i misunderstood your comment?

itmilos commented 1 year ago

If you delte this and reset redis you will remove all embeddings

Screenshot 2023-05-31 at 20 12 12
sbradford006 commented 1 year ago

Thanks again for your help - I did have that share on the storage account and have removed it. Currently assuming it was a hangover from a previous iteration of the project where the entire service was built in Azure (before i realized how expensive Redis was going to be!) as that file share hasn't been recreated.

Unfortunately the local Redis instance still appears to lose all embeddings after a container restart and unfortunately simply using the batch operation "convert all files and embeddings" doesn't reproduce them.

image

Any chance you know where the embeddings might be held on the API container, if it isn't /data?