langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.41k stars 14.78k forks source link

ChatGoogleGenerativeAI does not support streaming #22802

Open KosPsych opened 3 months ago

KosPsych commented 3 months ago

Checked other resources

Example Code

llm =  ChatGoogleGenerativeAI(model="gemini-1.5-pro", streaming=True,max_tokens=2524)

default_chain = LLMChain(
                            prompt = DEFAULT_PROMPT, 
                            llm=self.llm,
                            verbose=False
                            )

`default_chain.ainvoke({"input": rephrased_question['text']}, config={"callbacks":[callback]})`

  async def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
        """Rewrite on_llm_new_token to send token to client."""
         await self.send(token)

Error Message and Stack Trace (if applicable)

No response

Description

I have initiated a langchain chain as seen above, where call back is a class with on_llm_new_token To call the chain i use ainvoke.

If I use Anyscale llm class or VLLMOpenAI the response is streamed correctly, however with google this is not the case. Is there a bug in my code? Perhaps some other parameter I should pass to ChatGoogleGenerativeAI ot does google not support streaming?

System Info

langchain 0.1.0 langchain-community 0.0.11 langchain-core 0.1.9 langchain-google-genai 1.0.1 langchainhub 0.1.15 langsmith 0.0.92

KosPsych commented 3 months ago

@dosu-bot answer this

dosubot[bot] commented 3 months ago

ChatGoogleGenerativeAI does support synchronous streaming, but it does not support asynchronous streaming. This means that while you can stream responses synchronously, attempting to use asynchronous methods like ainvoke for streaming will not work.

To enable synchronous streaming, you should use the _stream method instead of ainvoke. Here is an example of how to set up synchronous streaming:

llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", streaming=True, max_tokens=2524)

default_chain = LLMChain(
    prompt=DEFAULT_PROMPT, 
    llm=llm,
    verbose=False
)

# Synchronous streaming example
for chunk in default_chain._stream({"input": rephrased_question['text']}, config={"callbacks": [callback]}):
    print(chunk)

If you need to handle the tokens asynchronously, you might need to use a different LLM class that supports asynchronous streaming, such as Anyscale or VLLMOpenAI.

For more details on the limitations of ChatGoogleGenerativeAI, you can refer to the relevant code snippet that indicates the lack of support for asynchronous streaming:

async def _agenerate(
    self,
    messages: List[BaseMessage],
    stop: Optional[List[str]] = None,
    run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
    **kwargs: Any,
) -> ChatResult:
    if "stream" in kwargs:
        kwargs.pop("stream")
        logger.warning("ChatVertexAI does not currently support async streaming.")

This warning explicitly states that asynchronous streaming is not supported [1].

To continue talking to Dosu, mention @dosu.

dharless-eli-lilly commented 1 month ago

Can anyone speak to this? I'm dealing with the same issue. Async callbacks not triggering and not getting a message back, although I'm trying to use ChatVertexAI and VertexAI from langchain-google-vertexai.

dharless-eli-lilly commented 4 weeks ago

Made it work for ChatVertexAI in langchain-google-vertexai by adding llm_kwargs={"stream": True} to the constructor of LLMChain. We're using a deprecated method on the chain, arun, but it eventually invokes ChatVertexAI's _astream() method which is implemented even though in the _agenerate() method it warns that ChatVertexAI doesn't support async streaming.

KosPsych commented 5 days ago

@dosu-bot This: # Synchronous streaming example for chunk in default_chain._stream({"input": rephrased_question['text']}, config={"callbacks": [callback]}): print(chunk) works. Is there a way to pass an image to gemini using this approach?