Open thomassrour opened 3 months ago
Hello,
I would like to stream the answer of my langchain qa chain. Here is how I'm trying to do it in the pipe method (almost as in #141) :
self.local_llm = ChatOllama(model="llama3:70b", #format="json", temperature=0, base_url="http://...:11434", streaming=True, keep_alive= -1, callbacks=[StreamingStdOutCallbackHandler()] ) chain = RetrievalQA.from_chain_type( llm=self.local_llm, retriever=retriever, return_source_documents=False, chain_type_kwargs={'prompt': prompt}, verbose = True, input_key="question", ) for chunk in chain.run(user_message): yield chunk
However, the words don't appear one by one as they should, instead I get large chunks of about 50 words at once. Any help would be much appreciated, thank you.
Hello,
I would like to stream the answer of my langchain qa chain. Here is how I'm trying to do it in the pipe method (almost as in #141) :
However, the words don't appear one by one as they should, instead I get large chunks of about 50 words at once. Any help would be much appreciated, thank you.