microsoft / autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
28.23k stars 4.13k forks source link

[Issue]: Workflow terminates after 2 tokens when using AutogenStudio with LM Studio #2445

Open Freffles opened 2 months ago

Freffles commented 2 months ago

Describe the issue

If I create a model in Autogen studio that points to the LM studio endpoint then add the model to an agent, then a workflow etc, when I run the workflow it will terminate after 2 tokens. Works perfectly with Ollama but every time I try LM Studio I have this problem regardless of the model I try with LM Studio. This is the output from LM studio server log.

[2024-04-19 16:25:21.195] [INFO] [LM STUDIO SERVER] Processing queued request... [2024-04-19 16:25:21.197] [INFO] Received POST request to /v1/chat/completions with body: { "messages": [ { "content": "You are a helpful AI assistant", "role": "system" }, { "content": "what is the federal capital of australia", "role": "user" } ], "model": "gemma", "max_tokens": null, "stream": false, "temperature": 0.1 } [2024-04-19 16:25:21.198] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Rolling Window [2024-04-19 16:25:21.201] [INFO] [LM STUDIO SERVER] Last message: { role: 'user', content: 'what is the federal capital of australia' } (total messages = 2) [2024-04-19 16:25:21.696] [INFO] [LM STUDIO SERVER] Accumulating tokens ... (stream = false) [2024-04-19 16:25:21.698] [INFO] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Accumulated 1 tokens: The [2024-04-19 16:25:21.799] [INFO] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Accumulated 2 tokens: The Federal [2024-04-19 16:25:21.911] [INFO] [LM STUDIO SERVER] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Generated prediction: { "id": "chatcmpl-zbpbxj8o70qatvsd8fz9ri", "object": "chat.completion", "created": 1713507921, "model": "lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The Federal" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 2, "completion_tokens": 2, "total_tokens": 4 } }

Steps to reproduce

I have no issue replicating on my setup. I have latest LM studio and Ollama. Anytime I create a model that is run on the LM Studio server I have this problem but not on the Ollama server

Screenshots and logs

No response

Additional Information

autogenstudio v0.0.56 LM Studio 0.2.19 with lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf Ollama 0.1.32 with llama3:latest (8B)

stigtk commented 2 months ago

I have the exact same issue with lm online and autogen studio.

castlenthesky commented 2 months ago

I just encountered the same issue as well. My LM Studio logs are essentially identical to what you have shared.

flynntmaverick commented 2 months ago

Rolled back to 53 works - Same issue here. I'm able to get agents communicating with writing python,but autogen studio gets stuck with 2 tokens

brustulim commented 2 months ago

If anyone is interested in a "work-around" until the fix, you can: update the file: autogenstudio\datamodel.py by replacing the line 95 from: max_tokens: Optional[int] = None to: max_tokens: Optional[int] = 3000

then create a new model (exactly like your previous) and replace the old one in your agents, workflows, etc.

ekzhu commented 2 months ago

cc @victordibia for awareness

victordibia commented 2 months ago

@all, thanks for flagging this.

This has been fixed in the current dev branch for ags (pr here). In addition to a default value for max_tokens, we also expose the max_tokens parameter in the UI so users can specify max_tokens. You can try this branch out today by running pip install autogenstudio==0.0.56rc3.

Should be merged into main soon.

image
Freffles commented 2 months ago

@victordibia Thanks!

Freffles commented 2 months ago

replacing the line 95 from: max_tokens: Optional[int] = None to: max_tokens: Optional[int] = 3000

@brustulim Thanks for the idea. I was going to try something like that but I was not sure if it was the right solution.

victordibia commented 2 months ago

@godblessme

You seemed to have mentioned having issues with v0.0.56rc3 , and LM studio in a deleted comment ... can you verify this is the case?

GodBlessingMe commented 2 months ago

@godblessme

You seemed to have mentioned having issues with v0.0.56rc3 , and LM studio in a deleted comment ... can you verify this is the case?

@victordibia Thank you for providing the v0.0.56rc3 version. This version fixes my problem with the maximal limit of two tokens when using LM Studio. However, I encountered that Python code generated by primary-assistant cannot be run. Could you kindly help me figure out what happened? I provide the multi-conversations between userproxy and primary-assistant. Thank you very much. Screenshot 2024-05-06 222252 Screenshot 2024-05-06 222326 Screenshot 2024-05-06 222410 Screenshot 2024-05-06 222445 Screenshot 2024-05-06 222510 Screenshot 2024-05-06 222533 Screenshot 2024-05-06 222621 Screenshot 2024-05-06 222654

GodBlessingMe commented 2 months ago

@victordibia Thank you. The issue was solved.

MaxAkbar commented 1 month ago

I'm using .NET code with LMStudio and it always fails on the second call. It works as expected when maxTokens is specified during agent creation.