langchain-ai / langchain-cohere

MIT License
31 stars 20 forks source link

Fix Stream Duplication Response Issue in ChatCohere #57

Closed abdalrohman closed 3 months ago

abdalrohman commented 3 months ago

This patch addresses a critical issue where the ChatCohere stream method was generating duplicate outputs. When attempting to stream responses using the ChatCohere class, users experienced repeated segments in the output stream, as demonstrated below:

from langchain_cohere import ChatCohere

llm = ChatCohere(
    model_name='command-r-plus',
    temperature=0.3,
    max_tokens=128_000,
)

# Duplicate outputs observed when streaming
for response in llm.stream("hello"):
    print(response)

The issue resulted in fragmented and repeated responses, such as:

content='Hello' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content='!' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' How' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' can' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' I' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' help' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' you' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content=' today' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
content='?' id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'
`content='Hello! How can I help you today?'` additional_kwargs={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'g
eneration_id': 'd5aa05df-7df2-486f-81c2-6f5910a2bff2', 'token_count': {'input_tokens': 67, 'output_tokens': 9}} response_metadata={'documents': None, 'citations': None, 'search_r
esults': None, 'search_queries': None, 'is_search_required': None, 'generation_id': 'd5aa05df-7df2-486f-81c2-6f5910a2bff2', 'token_count': {'input_tokens': 67, 'output_tokens': 9}} id='run-8baca3b2-5648-4942-bdbd-3640cbcbd6df'

This was particularly problematic in the line:

content='Hello! How can I help you today?'  ...

Resolution: The root cause was identified in the chat_model.py file within the _stream and _astream functions. The content field was incorrectly populated with data.response.text, leading to the duplication. The corrected code snippet is as follows:

# Updated code to prevent duplication
message = AIMessageChunk(
    content='',
    additional_kwargs=generation_info,
    tool_call_chunks=tool_call_chunks,
)

By removing data.response.text from the content field, the stream now generates clean, non-repetitive outputs:

content='Hello' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content='!' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' How' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' can' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' I' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' help' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' you' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content=' today' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
content='?' id='run-a07c7298-d6a5-4101-9514-d743daf7b015'
`content=''` additional_kwargs={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '499c8650-7128-48c1-
93f1-8bba01e6b24c', 'token_count': {'input_tokens': 67, 'output_tokens': 9}} response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None,
 'is_search_required': None, 'generation_id': '499c8650-7128-48c1-93f1-8bba01e6b24c', 'token_count': {'input_tokens': 67, 'output_tokens': 9}} id='run-a07c7298-d6a5-4101-9514-d743daf7b015'

This update ensures that the ChatCohere stream function operates as intended, providing users with a seamless and error-free experience.

ccurme commented 3 months ago

Thanks @abdalrohman. Updated this to use data.response.text for messages with tool calls (integration tests maintain len(content) > 0 for streaming tool calls).

It looks like this behavior was introduced in https://github.com/langchain-ai/langchain-cohere/pull/53. I don't have all the context on multi-hop tool calls but will merge this as the fix seems preferable to current streaming behavior.

Tagging @Anirudh31415926535 and @harry-cohere to take a look.