The create_stream method should as per create return usage chunks correctly, showing accurate token counts and maintaining the expected flow of processing messages without errors.
the OPENAI API and LITELLM (PROXY) api both support the stream_options={"include_usage": True} but when setting this in init of OpenAIChatCompletionClient( an error No stop reason found happens at the end of token stream handling.
How can we reproduce it (as minimally and precisely as possible)?
Pre-requisites:
OpenAI GPT-4o-mini requires: export OPENAI_API_KEY=xxxxxxxxxxx
For local model usage, install ollama and litellm, then run
ollama pull llama3.2:3b
ollama run llama3.2:3b
litellm --model ollama_chat/llama3.2:3b
Code Example:
# from autogen_core.components.models import OpenAIChatCompletionClient, UserMessage, CreateResult
from autogen_ext.models import OpenAIChatCompletionClient
from autogen_core.components.models import UserMessage, CreateResult
model_client = OpenAIChatCompletionClient(
# ------- using OpenAI API -----------------
model="gpt-4o-mini",
# stream_options={"include_usage": True},
# -------- for local model use -------------(see above ollama and litellm config)
# model="gpt-4o",
# api_key="NotRequiredSinceWeAreLocal",
# base_url="http://localhost:4000", # first run litellm --model ollama_chat/llama3.2:3b
# stream_options={"include_usage": True},
)
# Stream the result
model_client_result = model_client.create_stream(
messages=[
UserMessage(content="What is the capital of France?", source="user"),
extra_create_args={"stream_options": {"include_usage": True}},
],
)
try:
async for chunk in model_client_result:
print(f"chunk: {type(chunk)}: {chunk}")
if type(chunk) is CreateResult:
assert (
chunk.usage.prompt_tokens != 0 and chunk.usage.completion_tokens
), f"Assert: token counts should not be zero, {chunk.usage}"
except ValueError as e:
print(f"❌ a bug (🪲), Exception (ValueError): `{e}`")
except Exception as e:
print(f"❌ a bug (🪲), Exception: `{e}`")
else:
print(
f"✅: Finished Normally, last chunk is `{type(chunk).__name__}` with usage `{chunk.usage}`"
)
AutoGen version
0.4
Which package was this bug in
autogen_ext, autogen_core.components.models
Model used
openapi gpt-4o-mini and llama3.2:3B via ollama and litellm proxy
Python version
3.11.9
Operating system
ubunutu 22.04
Any additional info you think would be helpful for fixing this bug
suggest the extra_create_args={"stream_options": {"include_usage": True}} , should be the default in create_stream
I have a proposed fix for this issue, which I will submit as a PR. This fix aims to properly return the usage token counts by handling the stream_options={"include_usage": True} setting across both OpenAI and LiteLLM contexts without raising the No stop reason found error.
I have not been able to verify if same issue with AzureOpenAIChatCompletionClient
What happened?
The
create_stream
method inBaseOpenAIChatCompletionClient
as used byOpenAIChatCompletionClient
in_openai_client.py
returns 0 for prompt and Completion token usage counts. The src code currently show's a TODO raised by @jackgerrits in relation to theseusage counts https://github.com/microsoft/autogen/blob/f31ff663685a37f7960c4911b1837d36f1f32a13/python/packages/autogen-ext/src/autogen_ext/models/_openai/_openai_client.py#L661What did you expect to happen?
The
create_stream
method should as percreate
return usage chunks correctly, showing accurate token counts and maintaining the expected flow of processing messages without errors. the OPENAI API and LITELLM (PROXY) api both support thestream_options={"include_usage": True}
but when setting this in init ofOpenAIChatCompletionClient(
an errorNo stop reason found
happens at the end of token stream handling.How can we reproduce it (as minimally and precisely as possible)?
Pre-requisites:
OpenAI GPT-4o-mini requires: export OPENAI_API_KEY=xxxxxxxxxxx For local model usage, install ollama and litellm, then run
Code Example:
AutoGen version
0.4
Which package was this bug in
autogen_ext, autogen_core.components.models
Model used
openapi gpt-4o-mini and llama3.2:3B via ollama and litellm proxy
Python version 3.11.9
Operating system
ubunutu 22.04
Any additional info you think would be helpful for fixing this bug suggest the
extra_create_args={"stream_options": {"include_usage": True}}
, should be the default increate_stream
I have a proposed fix for this issue, which I will submit as a PR. This fix aims to properly return the usage token counts by handling thestream_options={"include_usage": True}
setting across both OpenAI and LiteLLM contexts without raising theNo stop reason found error
.I have not been able to verify if same issue with
AzureOpenAIChatCompletionClient