open-webui / pipelines

Pipelines: Versatile, UI-Agnostic OpenAI-Compatible Plugin Framework
MIT License
953 stars 299 forks source link

Stream Langchain qa chain answer #224

Open thomassrour opened 3 months ago

thomassrour commented 3 months ago

Hello,

I would like to stream the answer of my langchain qa chain. Here is how I'm trying to do it in the pipe method (almost as in #141) :

    self.local_llm  = ChatOllama(model="llama3:70b", 
                   #format="json",
                    temperature=0,  
                    base_url="http://...:11434",
                    streaming=True,
                    keep_alive= -1,
                    callbacks=[StreamingStdOutCallbackHandler()]
                    )

   chain =  RetrievalQA.from_chain_type(
              llm=self.local_llm,
              retriever=retriever,
              return_source_documents=False,
              chain_type_kwargs={'prompt': prompt},
              verbose = True,
              input_key="question",
          )

    for chunk in chain.run(user_message):
        yield chunk

However, the words don't appear one by one as they should, instead I get large chunks of about 50 words at once. Any help would be much appreciated, thank you.