Fix Stream Duplication Response Issue in ChatCohere

This patch addresses a critical issue where the ChatCohere stream method was generating duplicate outputs. When attempting to stream responses using the ChatCohere class, users experienced repeated segments in the output stream, as demonstrated below:

from langchain_cohere import ChatCohere

llm = ChatCohere(
    model_name='command-r-plus',
    temperature=0.3,
    max_tokens=128_000,
)

# Duplicate outputs observed when streaming
for response in llm.stream("hello"):
    print(response)

The issue resulted in fragmented and repeated responses, such as:

content='Hello' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content='!' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' How' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' can' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' I' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' help' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' you' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' today' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content='?' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
`content='Hello! How can I help you today?'` additional_kwargs={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'g
eneration_id': 'd5aa05df-7df2-486f-81c2-6f5910a2bff2', 'token_count': {'input_tokens': 67, 'output_tokens': 9}} response_metadata={'documents': None, 'citations': None, 'search_r
esults': None, 'search_queries': None, 'is_search_required': None, 'generation_id': 'd5aa05df-7df2-486f-81c2-6f5910a2bff2', 'token_count': {'input_tokens': 67, 'output_tokens': 9}} id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'

This was particularly problematic in the line:

content='Hello! How can I help you today?'  ...

Resolution: The root cause was identified in the chat_model.py file within the _stream and _astream functions. The content field was incorrectly populated with data.response.text, leading to the duplication. The corrected code snippet is as follows:

# Updated code to prevent duplication
message = AIMessageChunk(
    content='',
    additional_kwargs=generation_info,
    tool_call_chunks=tool_call_chunks,
)

By removing data.response.text from the content field, the stream now generates clean, non-repetitive outputs:

content='Hello' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content='!' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' How' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' can' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' I' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' help' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' you' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' today' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content='?' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
`content=''` additional_kwargs={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '499c8650-7128-48c1-
93f1-8bba01e6b24c', 'token_count': {'input_tokens': 67, 'output_tokens': 9}} response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None,
 'is_search_required': None, 'generation_id': '499c8650-7128-48c1-93f1-8bba01e6b24c', 'token_count': {'input_tokens': 67, 'output_tokens': 9}} id='run-a07c7298-d6a5-4101-9514-d743daf7b015'

This update ensures that the ChatCohere stream function operates as intended, providing users with a seamless and error-free experience.

langchain-ai / langchain-cohere

Fix Stream Duplication Response Issue in ChatCohere #57