Streamed responses incompatible with multiple choices (`n>1`)

sabrenner commented 1 month ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-3.5-turbo-0125", n=3)

parser = StrOutputParser()
chain =  model | parser

for chunk in chain.stream(input="tell me a joke about chickens"):
  print(chunk)

Chunks sometimes seem to be printed in a reasonable order, but sometimes it seems nondeterministic.

Error Message and Stack Trace (if applicable)

No response

Description

I'm trying to determine if the langchain streaming API works when multiple choices are specified on the model. I expected there to be some kind of index included with each chunk returned, but I was not able to see or use that. Since chunks seem to be yielded in a non-deterministic order, I'm not sure how to approach using a streamed response from (OpenAI, for example) a chat model or LLM where we specify n>1 as a general config on the model or LLM. Specifically:

In general, is indexing for chunks supported, or planned to be supported? Or, is there another workaround for this?
It does seem that chunks are yielded non-deterministically. Is that accurate?
(langchain_openai specific) I took a look at some source code and noticed that only the first choice is grabbed from the chunk. Is this intentional? If so, is there a reason other choices are disregarded from the chunk?

I also tried seeing if the result emitted to the on_chat_model_end event had the correct generations. However, it doesn't look like it:

{'event': 'on_chat_model_end', 'data': {'output': AIMessageChunk(content="SureSureSure,,, here's here's here's a a a classic chicken classic one joke chicken for for joke you you for:\n\n:\n\n youWhyWhy:\n\n did didWhy the the did chicken chicken the chicken join a band?\n\n go to the seance?\n\n go to the seance?\n\nToToBecause it talk talk had to to the the the drum other othersticks side side!!! 🥁 🐔✨🐔", additional_kwargs={}, response_metadata={'finish_reason': 'stopstopstop', 'model_name': 'gpt-4o-2024-05-13gpt-4o-2024-05-13gpt-4o-2024-05-13', 'system_fingerprint': 'fp_e375328146fp_e375328146fp_e375328146'}, id='run-5b6e89fc-be08-4b8e-984f-a9a71b974e7a'), 'input': {'messages': [[HumanMessage(content='tell me a joke about chickens', additional_kwargs={}, response_metadata={})]]}}, 'run_id': '5b6e89fc-be08-4b8e-984f-a9a71b974e7a', 'name': 'ChatOpenAI', 'tags': ['seq:step:1'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-4o', 'ls_model_type': 'chat', 'ls_temperature': 0.7, 'ls_max_tokens': 50}, 'parent_ids': ['56a188ee-d914-46e6-ba9d-568bfb53c167']}

System Info

System Information

OS: Darwin OS Version: Darwin Kernel Version 23.6.0 Python Version: 3.10.13 [Clang 15.0.0 (clang-1500.1.0.2.5)]

Package Information

langchain_core: 0.3.2 langchain: 0.3.0 langsmith: 0.1.125 langchain_input_error: Installed. No version info available. langchain_openai: 0.2.0 langchain_stream: Installed. No version info available. langchain_text_splitters: 0.3.0 langchain_tools: Installed. No version info available. langgraph: Installed. No version info available.

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.10.5 async-timeout: 4.0.3 httpx: 0.27.0 jsonpatch: 1.33 numpy: 1.26.4 openai: 1.47.0 orjson: 3.10.7 packaging: 23.2 pydantic: 2.9.2 PyYAML: 6.0.2 requests: 2.32.3 SQLAlchemy: 2.0.32 tenacity: 8.5.0 tiktoken: 0.7.0 typing-extensions: 4.12.2

keenborder786 commented 1 month ago

I dont't understand what you meant by indexing. Since the generation is done non-deterministically server side of the LLM Provider, I doubt we can do indexing???
Yes
Yes, because the first choice is the best one.

sabrenner commented 1 month ago

Hi @keenborder786, thanks for your answers on these questions. For indexing, specifically for OpenAI, I'm referring to this spec on their streamed responses API, which specify a choice index for each choice in a given chunk. I'm unsure if other partner libraries have this in their API, but would it be possible to surface this index in langchain-openai?

langchain-ai / langchain