Closed roldengarm closed 2 weeks ago
hi @roldengarm could you provide some more information on the following points?
It seems the VM hosting Postgres might be running out of memory due to the vector index size. Here are a few options to consider:
text-embedding-3
, truncating the vectors can reduce the memory footprint. Keep in mind that this reduces also the precision of relevance scores: test this approach to see if Search returns incorrect results or if Ask produces hallucinations. For more details see "shortening embeddings" (https://openai.com/index/new-embedding-models-and-api-updates/) and MRL (e.g. here https://aniketrege.github.io/blog/2024/mrl/).Hi @dluc thanks for your reply!
We're using text-embedding-3-large
on Azure OpenAI.
Regarding vector size or truncating of vectors: I'm unsure. I've just deployed Kernel Memory as a service with the dotnet setup
wizard. So, I'm using the default settings I guess.
Technically, I can increase memory relatively easily as it's on Azure Flexible Postgres, but obviously it comes at a cost. I've tried to upgrade to 64GB, but the problem only went away after I stopped the ingestion. This is strange as I run the ingestion at max 12 in parallel.
Search performance is critical as it will be used in a chat interface, so don't think HNSW is suitable.
Try using text-embedding-3-small
, that should cut memory usage in half. The problem here is about sizing Postgres infrastructure accordingly to the data used, including the index size in memory.
If cost is a facrtor, you should test HNSW before discarding the option, to understand the impact on performance and how much you can save in monthly costs.
Context / Scenario
Running KM as a service, using queues, Azure OpenAI Embedding-3, PostgresDB as backend. Ingested about 900k records & ~15Gb of data. Consuming it from a console application using the WebClient.
Initially after a couple of thousand documents, when calling SearchAsync / AskAsync, it worked fine. However, after ~900k records, I'm getting a server error (500) every time. In the logs I can see a Npgsql.NpgsqlException - Timeout.
What happened?
I'm getting a server error every time. The ingestion process is still running at about ~12 documents at the same time. I don't see high CPU usage on the App Service or Postgres (Flexible Azure server). Tried increasing Postgres to 4vCores 16Gb RAM, no difference.
We're planning to ingest a total of 9m documents so it's concerning it's already throwing errors at 900k.
Importance
I cannot use Kernel Memory
Platform, Language, Versions
Using C#, KM deployed as a service to Azure App Service, using Azure Postgres Flexible Server, Azure Storage, and Azure OpenAI
This is our second try; we initially tried using Azure AI Search instead of Azure Postgres, but the costs for storing ~9m records was astronomical.
Relevant log output