Closed eyurtsev closed 1 month ago
Is everyone so slow? Or does it have something to do with hardware configuration?
Hi,
I ran the code shared below.
`from langchain_community.llms import Ollama import time
llm = Ollama(base_url='http://localhost:11434', model="llama3:instruct", temperature=0) start_time = time.time() response = llm.invoke("Tell me a joke") print("--- %s seconds ---" % (time.time() - start_time)) print(response)`
I also noticed the following:
Any comments / thoughts on fixing these issues?
Here are my installed libraries of langchain for reference
I came across the same non-stop issue for llama3 with Ollama.
Please update some stuff. I ran all the langchain packages update after llama3 and that fixed this issue. But after that I am getting a ollama error 400. Let me know if you face the same issue.
I had the same issue using GraphCypherQAChain, it keeps talking with itself forever. These are the libraries and the system info:
langchain==0.1.16 langchain-cli==0.0.21 langchain-community==0.0.33 langchain-core==0.1.43 langchain-openai==0.0.8 langchain-text-splitters==0.0.1
platform linux python version 3.11.7
Of course, it generates the Cypher query and replies to my question, but it keeps running in a loop, and it adds markups in JSON format.
I think we're all set!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Let's wrap up!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
All done!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
That's it!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
It looks like I'm just waiting for your confirmation before considering our task complete!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Task complete!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
I think we've finished the answer generation!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Yay!<|eot_id|><|start_header_id|>assistant<|end_header_id|>
...
OS: Linux OS Version: #1 SMP Thu Jan 11 04:09:03 UTC 2024 Python Version: 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0]
langchain_core: 0.1.45 langchain: 0.1.16 langchain_community: 0.0.34 langsmith: 0.1.49 langchain_chroma: 0.1.0 langchain_cli: 0.0.21 langchain_experimental: 0.0.53 langchain_groq: 0.1.2 langchain_nomic: 0.0.2 langchain_pinecone: 0.0.3 langchain_text_splitters: 0.0.1 langchainhub: 0.1.15 langgraph: 0.0.38 langserve: 0.1.0
@SinghJivjot it works! Thank you
Hi, I think I have the same problem, with ollama 0.1.33 on Windows
Python 3.12.1 langchain-core 0.1.48 langchain 0.1.17 langchain-community 0.0.36 langchain-chroma 0.1.0 langchain-openai 0.1.5 langchain-text-splitters 0.0.1 langchainhub 0.1.15 langsmith 0.1.52
I have the problem that ollama is really slow, taking approx. 10 times as long than last week when using it for the llm in my RAG chain (300s instead of 30s)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
Invoking just the llm works today fine. Yesterday it also took 30-60 seconds instead of 3-4.
llm = Ollama(model="llama3",
temperature=0.0,
)
llm.invoke("Tell me a joke")
I am unsure if thats a langchain, ollama, or me problem. The problem appeared at the 03.05.. At this day I updated ollama and reinstalled torch with cuda. Now I seem to have two different cuda versions installed.
We need to investigate whether we have an issue with the ollama integration, and if so why?
Discussed in https://github.com/langchain-ai/langchain/discussions/18515