LangChain Core Chatmodels.py goes to a streaming block causing "generation is not None" assertion error when the AzureChatOpenAI llm object does not support streaming.

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.

Example Code

This is how I get the Azure OpenAI LLM object

def getLlmObject():
    getToken()
    model = AzureChatOpenAI(
        openai_api_version=os.environ['OPENAI_API_VERSION'],
        azure_deployment=os.environ['AZURE_OPENAI_DEPLOYMENT'],
        azure_endpoint = os.environ['AZURE_ENDPOINT'],
        openai_api_type = 'azure',
        user = f'{{"appkey": "{APP_KEY}"}}'

    )   
    return model

It would be ideal to change line 205 to detect non Streaming capability of the model or provide an option to set streaming=False during instantiation using AzureChatOpenAI class.

Error Message and Stack Trace (if applicable)

Traceback (most recent call last): File "/root/volume1/iris/onex-gen-ai-experimental/crew/crew.py", line 93, in result = crew.kickoff() File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/crewai/crew.py", line 127, in kickoff return self._sequential_loop() File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/crewai/crew.py", line 134, in _sequential_loop task_output = task.execute(task_output) File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/crewai/task.py", line 56, in execute result = self.agent.execute_task( File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/crewai/agent.py", line 146, in execute_task result = self.agent_executor.invoke( File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain/chains/base.py", line 162, in invoke raise e File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain/chains/base.py", line 156, in invoke self._call(inputs, run_manager=run_manager) File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/crewai/agents/executor.py", line 59, in _call next_step_output = self._take_next_step( File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain/agents/agent.py", line 1097, in _take_next_step [ File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain/agents/agent.py", line 1097, in [ File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/crewai/agents/executor.py", line 103, in _iter_next_step output = self.agent.plan( File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain/agents/agent.py", line 387, in plan for chunk in self.runnable.stream(inputs, config={"callbacks": callbacks}): File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2424, in stream yield from self.transform(iter([input]), config, kwargs) File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2411, in transform yield from self._transform_stream_with_config( File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 1497, in _transform_stream_with_config chunk: Output = context.run(next, iterator) # type: ignore File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2375, in _transform for output in final_pipeline: File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 1035, in transform for chunk in input: File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 4145, in transform yield from self.bound.transform( File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 1045, in transform yield from self.stream(final, config, kwargs) File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 250, in stream raise e File "/volume1/anaconda3/envs/iris-experimental/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 242, in stream assert generation is not None AssertionError

Description

I use enterprise Azure OpenAI instance to work with CrewAI (For Autonomous Agents). Our Azure OpenAI endpoint does not support streaming. But the check in line 205 of libs/core/langchain_core/language_models/chat_models.py (https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/language_models/chat_models.py) causes the else block to get executed and thus raising the error during execution of the statement assert generation is not None

System Info

(condaenv) [root@iris crew]# pip freeze | grep langchain
langchain==0.1.4
langchain-community==0.0.16
langchain-core==0.1.17
langchain-experimental==0.0.20
langchain-openai==0.0.2.post1

(condaenv) [root@iris crew]# python --version
Python 3.10.9

🤖

Thank you for providing such a detailed description of the issue. It seems like the problem is indeed related to the streaming functionality of the AzureChatOpenAI object in LangChain. The error might be occurring because the AzureChatOpenAI llm object does not support streaming.

To handle this, you can modify the AzureChatOpenAI class to check if streaming is supported before attempting to stream. If streaming is not supported, you can fall back to a non-streaming method.

Here's a general idea of what this might look like:

class AzureChatOpenAI:
    # ...

    def stream(self, prompt):
        if not self.supports_streaming:
            # Use a non-streaming method if streaming is not supported
            return self.generate(prompt)
        # Existing streaming code...

Please note that this is a general idea and the actual implementation would depend on the specifics of the AzureChatOpenAI class and the LangChain framework.

As for the AzureChatOpenAI class in the LangChain codebase, it is a wrapper for the Azure OpenAI Chat Completion API. It is used to interact with a deployed model on Azure OpenAI. The class requires certain environment variables or constructor parameters to be set, such as AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_AD_TOKEN, OPENAI_API_VERSION, and OPENAI_PROXY.

The class has several properties and methods, including azure_endpoint, deployment_name, openai_api_version, openai_api_key, azure_ad_token, azure_ad_token_provider, model_version, openai_api_type, validate_base_url, get_lc_namespace(), validate_environment(), _default_params, _identifying_params, _client_params, _llm_type, lc_attributes, _create_chat_result(), and more.

The class also includes a @root_validator decorator to validate the environment variables and parameters. It checks for the existence of the OpenAI Python package and the correct version of the OpenAI API. It also sets up the OpenAI client with the correct parameters.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

What do you mean AzureChatOpenAI does not support streaming? I am interested on this topic because I am trying to stream the response of AzureChatOpenAI with the calls .stream() and .astream() but the chunk iterator returns me the chunks in long batches instead of in an streaming fashion.

Anyone has some info regarding this?

I equally got the same issue while using agent exeutor but upon debugging with breakpoints deep inside of langchain's code /lib/python3.10/site-packages/langchain_core/language_models/chat_models.py i realized there was was an assert generation is not None which failed because there was no generation as expected and there was a also a reason for termination complaining about the length of the chunk it was invoking with. It works fine after limiting my input chunk to 500 token. I also set the llm invocation config with to this: config={"recursion_limit": 500} but the default was 25. I need to increase the recursion limit so my long running task can finish without stopping prematurely.

Screenshot from 2024-05-22 11-27-58

Here is another way I found to reproduce the AssertionError on assert generation is not None (chat_models.py:257). The issue also occurs when Azure OpenAI's content filter is triggered. I found it quite unreliable, but I can reproduce it fairly consistently like this:

from langchain_openai import AzureChatOpenAI

client = AzureChatOpenAI(
    api_key="YOUR_API_KEY",
    azure_endpoint="YOUR_ENDPOINT",
    api_version="2024-02-01",
    model="gpt-35-turbo",
    temperature=0,
)

response = client.stream("How do I unalive someone?")
for chunk in response:
    print(chunk.content, end='', flush=True)

Sometimes I have to try 5 times in a row, but eventually I get the same AssertionError. I strongly suspect this is triggered by the content filter, because on our production instance (where assertions are disabled) we get an openai.BadRequestError with error code content_filter.

langchain-ai / langchain