Open gianx89 opened 4 days ago
@gianx89 thanks for raising the issue. Will you be able to share more how you are provisioning your local API? Anything that might help reproducing?
Hi and thanks, @MohMaz.
I’ve tried the following provisioning modes:
I’ve been focusing on various Llama versions and tested them all in different "sizes". Here’s what I’ve tried:
meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
llama3.1
llama3.2
llama3.2-vision
I’ve also experimented with Mistral and other models in the RAG-only version of my project, but I consistently encountered problems. Now, I’m shifting my focus to the Multi-Agent aspect of the project.
I’ve tried reducing the token size (using an approximate method) and limiting the size of the chat history, but the results remain the same.
Here’s my poetry.toml
file (I’m experimenting with several libraries in other packages):
[tool.poetry]
name = "comics"
version = "0.0.1"
description = "Multi Agent RAG System to write a comic book story"
readme = "README.md"
[tool.poetry.dependencies]
python = "^3.11,<3.13"
atomic-agents = "^1.0.10"
autogen = "^0.3.1"
deepeval = "^1.5.0"
instructor = "^1.0.1"
rich = "^13.9.4"
instructorembedding = "^1.0.1"
sentence-transformers = "^3.3.0"
emoji = "^2.14.0"
ipython = "^8.29.0"
markdownify = "^0.13.1"
flaml = {extras = ["automl"], version = "^2.3.2"}
pydantic = "2.9.2"
pydantic-core = "2.23.4"
ollama = "^0.3.3"
httpx = "^0.27.2"
ollama-api = "^0.1.1"
autogen-agentchat = {extras = ["together"], version = "^0.2.39"}
matplotlib = "^3.9.2"
groq = "^0.12.0"
llama-index = "^0.12.1"
llama-index-llms-ollama = "^0.4.1"
chromadb = "^0.5.20"
llama-index-vector-stores-chroma = "^0.4.0"
llama-index-embeddings-ollama = "^0.4.0"
llama-index-embeddings-huggingface = "^0.4.0"
langgraph = "^0.2.53"
pyautogen = "^0.4"
litellm = {extras = ["proxy"], version = "^1.52.16"}
together = "^1.3.5"
langchain-together = "^0.2.0"
tiktoken = "^0.8.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
This is a link to a Python file to reproduce the problem.
Currently, it uses meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
from together.ai, but you can comment out the code to switch to using the local llama3.2-vision
via Ollama.
You must provide some documents in the RAG documents folder; otherwise, you might not be able to reproduce the issue. The problem sometimes worsens when I include context from RAG.
I think the termination condition was triggered. It could be the message transform that truncates the message and prevents the message from displaying fully. @thinkall @WaelKarkoub what do you think?
It happened even without truncation. I added it trying to solve the problem. I'll post later an output without truncation. Or you can try it disabling the truncation in the provided script.
Here it's an output without truncation, the same problem happens.
==============================
Starting Chat using model: meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
==============================
Trying to create collection.
2024-11-29 21:45:19,089 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Use the existing collection `autogen-docs`.
2024-11-29 21:45:19,243 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 36 chunks.
VectorDB returns doc_ids: [['45f5db71']]
Adding content of doc 45f5db71 to context.
P0 (to chat_manager):
You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the
context provided by the user.
If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.
You must give as short an answer as possible.
User's question is:
[INST]As P0, I will initiate the discussion. I request Team S to create a story for a comic that contains exactly 10 panels.
The story must adhere to the following guidelines:
Characters: The story should feature exactly 2 characters:
- Character 1: An experienced superhero comic writer and professor.
- Character 2: A new writer who is learning to create superhero comic book stories.
Story Subject:
- The main character (professor) teaches the new writer the art of crafting superhero comic book stories.
- The plot should reflect the dynamics of this mentor-student relationship and highlight creative and technical aspects of superhero storytelling.
Note: Ensure the story is engaging, concise, and evenly distributed across the 10 panels.
NEXT: S0.[/INST]
Context is: Many authors began their love of storytelling by reading comics as children. Everything from The Beano and Marvel
...
NEXT: S0
--------------------------------------------------------------------------------
Next speaker: S0
S0 (to chat_manager):
I'll start creating the story based on your idea. Here's the first draft:
TITLE: The Art of Superhero Storytelling
ABSTRACT: A seasoned comic book writer and professor teaches a new writer the art of crafting superhero comic book stories.
PROTAGONISTS:
- Professor Thompson: An experienced superhero comic writer and professor.
- Alex Chen: A new writer eager to learn the art of creating superhero comic book stories.
PANEL START 1
IMAGE_DESCRIPTION: Professor Thompson sitting at a desk with various comic books and papers scattered around. Alex Chen entering the room, looking a bit nervous.
IMAGE_DIALOGUES:
Professor Thompson: "Welcome to the world of superhero comics! I'm excited to share my knowledge with you."
Alex Chen: "I've always been a huge fan of comics, but I have no idea where to start."
PANEL END 1
PANEL START 2
IMAGE_DESCRIPTION: Close-up of Professor Thompson's face, smiling and enthusiastic.
IMAGE_DIALOGUES:
Professor Thompson: "Don't worry, Alex. We'll start with the basics. What makes a great superhero story?"
PANEL END 2
PANEL START 3
IMAGE_DESCRIPTION: Alex Chen looking thoughtful, with a speech bubble above their head.
IMAGE_DIALOGUES:
Alex Chen: "I think it's the hero's journey. The struggle between good and evil."
PANEL END 3
PANEL START 4
IMAGE_DESCRIPTION: Professor Thompson nodding in agreement, with a comic book panel in the background showing a classic superhero battle scene.
IMAGE_DIALOGUES:
Professor Thompson: "That's right! The hero's journey is a classic trope in superhero comics. But what about the characters themselves? What makes them relatable and engaging?"
PANEL END 4
PANEL START 5
IMAGE_DESCRIPTION: Alex Chen looking puzzled, with a thought bubble above their head.
IMAGE_DIALOGUES:
Alex Chen: "I'm not sure. I've always just focused on the action and adventure."
PANEL END 5
PANEL START 6
IMAGE_DESCRIPTION: Professor Thompson smiling, with a whiteboard behind them filled with notes and diagrams.
IMAGE_DIALOGUES:
Professor Thompson: "Ah, that's where character development comes in. Let me show you some techniques for creating well-rounded characters."
PANEL END 6
PANEL START 7
IMAGE_DESCRIPTION: Professor Thompson drawing on the whiteboard, with Alex
--------------------------------------------------------------------------------
==============================
Chat Ends
==============================
Process finished with exit code 0
This is the link containing the code to reproduce the error.
Hi to all! I'm using Autogen to develop a RAG system, even better if Multi-Agent. I must use open source, preferably local models, like llama 3.1 or llama 3.2 . I'm using ChromaDB as my vector database.
I'm developing a system that can write a Comic Book Story with a specific format. There is writer or teams of writers that writes the story, a critic or a team of critics that gives advices on how to improve the story a manager or team of managers that incorporates this suggestions. When the story is deemed satisfactory, an agent writes "TERMINATE".
I don't have any issues using OpenAI APIs and models like GPT-3.5 Turbo or GPT-4. However, when working with open-source or local models, I encounter unpredictable behaviors.
The agents starts talking:
The chats ends without error but abruptly, without meeting the termination condition. Sometimes, very rarely, I get the right results and the chat ends with "TERMINATE" string.
Any suggestions? I can't share all the code at the moment, but I can reply with snippets of it.
P.S. The max_rounds are pretty high (100)