Open Freffles opened 7 months ago
I have the exact same issue with lm online and autogen studio.
I just encountered the same issue as well. My LM Studio logs are essentially identical to what you have shared.
Rolled back to 53 works - Same issue here. I'm able to get agents communicating with writing python,but autogen studio gets stuck with 2 tokens
If anyone is interested in a "work-around" until the fix, you can:
update the file:
autogenstudio\datamodel.py
by replacing the line 95 from:
max_tokens: Optional[int] = None
to:
max_tokens: Optional[int] = 3000
then create a new model (exactly like your previous) and replace the old one in your agents, workflows, etc.
cc @victordibia for awareness
@all, thanks for flagging this.
This has been fixed in the current dev branch for ags (pr here).
In addition to a default value for max_tokens, we also expose the max_tokens parameter in the UI so users can specify max_tokens. You can try this branch out today by running pip install autogenstudio==0.0.56rc3
.
Should be merged into main soon.
@victordibia Thanks!
replacing the line 95 from:
max_tokens: Optional[int] = None
to:max_tokens: Optional[int] = 3000
@brustulim Thanks for the idea. I was going to try something like that but I was not sure if it was the right solution.
@godblessme
You seemed to have mentioned having issues with v0.0.56rc3
, and LM studio in a deleted comment ... can you verify this is the case?
@godblessme
You seemed to have mentioned having issues with
v0.0.56rc3
, and LM studio in a deleted comment ... can you verify this is the case?
@victordibia Thank you for providing the v0.0.56rc3 version. This version fixes my problem with the maximal limit of two tokens when using LM Studio. However, I encountered that Python code generated by primary-assistant cannot be run. Could you kindly help me figure out what happened? I provide the multi-conversations between userproxy and primary-assistant. Thank you very much.
@victordibia Thank you. The issue was solved.
I'm using .NET code with LMStudio and it always fails on the second call. It works as expected when maxTokens
is specified during agent creation.
Describe the issue
If I create a model in Autogen studio that points to the LM studio endpoint then add the model to an agent, then a workflow etc, when I run the workflow it will terminate after 2 tokens. Works perfectly with Ollama but every time I try LM Studio I have this problem regardless of the model I try with LM Studio. This is the output from LM studio server log.
[2024-04-19 16:25:21.195] [INFO] [LM STUDIO SERVER] Processing queued request... [2024-04-19 16:25:21.197] [INFO] Received POST request to /v1/chat/completions with body: { "messages": [ { "content": "You are a helpful AI assistant", "role": "system" }, { "content": "what is the federal capital of australia", "role": "user" } ], "model": "gemma", "max_tokens": null, "stream": false, "temperature": 0.1 } [2024-04-19 16:25:21.198] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Rolling Window [2024-04-19 16:25:21.201] [INFO] [LM STUDIO SERVER] Last message: { role: 'user', content: 'what is the federal capital of australia' } (total messages = 2) [2024-04-19 16:25:21.696] [INFO] [LM STUDIO SERVER] Accumulating tokens ... (stream = false) [2024-04-19 16:25:21.698] [INFO] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Accumulated 1 tokens: The [2024-04-19 16:25:21.799] [INFO] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Accumulated 2 tokens: The Federal [2024-04-19 16:25:21.911] [INFO] [LM STUDIO SERVER] [lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf] Generated prediction: { "id": "chatcmpl-zbpbxj8o70qatvsd8fz9ri", "object": "chat.completion", "created": 1713507921, "model": "lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The Federal" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 2, "completion_tokens": 2, "total_tokens": 4 } }
Steps to reproduce
I have no issue replicating on my setup. I have latest LM studio and Ollama. Anytime I create a model that is run on the LM Studio server I have this problem but not on the Ollama server
Screenshots and logs
No response
Additional Information
autogenstudio v0.0.56 LM Studio 0.2.19 with lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q8_0.gguf Ollama 0.1.32 with llama3:latest (8B)