I'm using GPT4ALL-API Cpu build in my laptop and modified "/embeddings" API to connect with weaviate vector database, the connection was established between them, it using "all-MiniLM-L6-v2-f16.gguf" model for embedding, weaviate uses gpt4all container to get embedding of incoming data and stores it in Database. for each "/embedding" request, GPT4all container memory was increasing 50-80 MB on each request . is there any way to flush the memory or limit the memory or where did i go wrong ?
Issue you'd like to raise.
I'm using GPT4ALL-API Cpu build in my laptop and modified "/embeddings" API to connect with weaviate vector database, the connection was established between them, it using "all-MiniLM-L6-v2-f16.gguf" model for embedding, weaviate uses gpt4all container to get embedding of incoming data and stores it in Database. for each "/embedding" request, GPT4all container memory was increasing 50-80 MB on each request . is there any way to flush the memory or limit the memory or where did i go wrong ?
Suggestion:
No response