Open 130jd opened 1 month ago
It sounds like you're encountering a complex issue with the Ollama x LlamaIndex integration, where resource consumption and timeouts are a significant concern. Here are some targeted insights and potential solutions to address the problems you're facing:
Machine Specs and Resource Consumption: Although the Ollama Starter Example worked once on your 16GB RAM machine, the minimum requirement of 32GB RAM suggests that resource constraints might be causing the timeouts and excessive resource consumption you're observing. The initial success might have been due to lower resource demands for the first run or variations in the data being processed.
Llama-Index Specific Issues: The in-memory vector store of Llama-Index could indeed be running out of memory on subsequent tries, especially if it's not releasing memory efficiently after each use. This aligns with your observation that it worked the first time but failed thereafter. The llama-index-llms-ollama
library might have specific memory management or resource allocation patterns that differ from other setups you've used successfully.
Ollama Configuration and Performance: Issues with default context window mismatches and the specific configurations for Ollama could be contributing to the odd behavior and failures. Adjusting the request_timeout
parameter in the Ollama initialization to accommodate longer processing times might help, as well as ensuring that the Ollama server is adequately provisioned and optimized for performance.
Troubleshooting Steps:
request_timeout
parameter to see if it alleviates the timeout issues.Additionally, a similar issue related to kernel crashing or CUDA OutOfMemory Error when running embedding models locally suggests that local models cannot run asynchronously due to being compute-bound. Lowering the batch size, for example, using HuggingFaceEmbedding(..., embed_batch_size=1)
, might reduce memory usage and alleviate some of the problems you're experiencing.
Understanding these principles and adjusting your setup accordingly could help mitigate the issues with the Ollama x LlamaIndex integration. If you continue to face challenges, providing more specific error messages or configurations could help in diagnosing and addressing the problem more effectively.
To continue the conversation, mention @dosu.
I've never had or really seen this issue 🤔 you could decrease the context window on the llm to try and reduce memory usage if that feels like the issue.
LlamaIndex really tries to use the full context window always, to reduce llm calls
Question Validation
Question
I've previously used a RAG system using Ollama x langchain, but decided to try Ollama x llama-index to see if I'd get better results. Started off with the Ollama Starter Example, got it to work once, and then thereafter it would always timeout (despite setting a 10+ min timeout).
More details: In these failures, I noticed that it wouldn't even progress to the embedding phase (where the progress bars load), and looking in task manager the
ollama_llama_server.exe
was consuming a ton of resources but nothing was happening until the timeout. When it timed out, the error message would mentionmap_httpcore_exceptions
(similar to the logs here).llama-index-llms-ollama
library; some say using the Ollama library works?Does any of the above give any clues that help diagnose what's going on? I'm not just looking for a fix, but more trying to understand the principles of why llama-index may not be working for me. And if there's some fundamental problem with Ollama x llama-index (that isn't a problem with Ollama x langchain), such that if I want to use llama-index I need to ditch Ollama for OpenAI?
Thanks in advance.