Closed MarianoMolina closed 6 months ago
Did a run and added some comments to ConversableAgent to try to track down the issue and I still can't figure out what it is:
List 5 roles/positions that benefit strongly from a high EQ, and list your reasons.
--------------------------------------------------------------------------------
>>>>>>>> USING AUTO REPLY...
1. generate_oai_reply for drafter to chat_manager. config: None
2. _generate_oai_reply_from_client for drafter. llm_client: [{'cache_seed': False, 'temperature': 0.5, 'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'}], cache: None
2.b. _generate_oai_reply_from_client messages: [{'content': '[SYSTEM MESSAGE]"', 'role': 'system'}, {'content': '[PROMPT]', 'name': 'user_proxy_auto', 'role': 'user'}]
3. _generate_oai_reply_from_client response: ChatCompletion(id='chatcmpl-9jjt8tzao65chsce8vo6n', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="[REPLY 1]", role='assistant', function_call=None, tool_calls=None))], created=1713971689, model='TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=340, prompt_tokens=340, total_tokens=680), cost=0, message_retrieval_function=<bound method OpenAIClient.message_retrieval of <autogen.oai.client.OpenAIClient object at 0x00000253849CA350>>, config_id=0, pass_filter=True)
drafter (to chat_manager):
[REPLY 1]
--------------------------------------------------------------------------------
1. Calling reflection_with_llm_as_summary from user_proxy_auto to chat_manager
1.b. llm_config from sender: {'cache_seed': False, 'temperature': 0.5, 'config_list': [{'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', 'base_url': 'http://localhost:1234/v1', 'api_key': 'lm-studio'}], 'timeout': 120} and from recipient: {'cache_seed': False, 'temperature': 0.5, 'config_list': [{'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', 'base_url': 'http://localhost:1234/v1', 'api_key': 'lm-studio'}], 'timeout': 120}
2. _generate_oai_reply_from_client for user_proxy_auto. llm_client: [{'cache_seed': False, 'temperature': 0.5, 'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'}], cache: None
2.b. _generate_oai_reply_from_client messages: [{'content': '[PROMPT]', 'role': 'user', 'name': 'user_proxy_auto'}, {'content': "[REPLY 1]", 'role': 'user', 'name': 'drafter'}, {'role': 'system', 'content': 'List the final answer to the task.'}]
3. _generate_oai_reply_from_client response: ChatCompletion(id='chatcmpl-iv8jxgvqcydrcavkk0yrm', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='', role='assistant', function_call=None, tool_calls=None))], created=1713971727, model='TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=0, prompt_tokens=1, total_tokens=1), cost=0, message_retrieval_function=<bound method OpenAIClient.message_retrieval of <autogen.oai.client.OpenAIClient object at 0x00000253849F1780>>, config_id=0, pass_filter=True)
response:
chat_result history: [{'content': '[PROMPT]', 'role': 'assistant'}, {'content': "[REPLY 1]", 'name': 'drafter', 'role': 'user'}]
chat_result summary:
From what I see, despite the flows (reflection_with_llm and normal generate_reply) being distinct, I cannot see why _generate_oai_reply_from_client's response is empty with reflection_with_llm.
There's clearly an issue since the reflection flow _generate_oai_reply_from_client response object shows prompt tokens as 1, but the messages object being passed seems correct.
Have you checked the logs on your local model server?
[2024-04-24 22:54:11.733] [INFO] Received POST request to /v1/chat/completions with body: { "messages": [ { "content": "[SYSTEM MESSAGE].", "role": "system" }, { "content": "[PROMPT]", "role": "user", "name": "user_proxy_auto" }, { "content": "[RESPONSE 1]", "role": "user", "name": "hr_expert_drafter" }, { "role": "system", "content": "[PROMPT 2]" } ], "model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf", "stream": false } [2024-04-24 22:54:11.733] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Truncate Middle [2024-04-24 22:54:11.735] [INFO] [LM STUDIO SERVER] Last message: { role: 'system', content: '[PROMPT 2]' } (total messages = 4) [2024-04-24 22:54:12.500] [INFO] [LM STUDIO SERVER] [TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf] Generated prediction: { "id": "chatcmpl-t6yiyywzta91zv0ozsr9r", "object": "chat.completion", "created": 1714010051, "model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 1, "completion_tokens": 0, "total_tokens": 1 } }
Ok, I figured it out. Seems Mistral Instruct returns an empty completion when there's a second "system" message in the middle of the flow. Seems to be an issue with compatibility, which is something I should be aware of when using local models with Autogen, as stated in many places. After repeating the process with this change, the response now seems correct. With this in mind, it might be a good idea to allow the role of the reflection_with_llm prompt to be defined, since right now its hardcoded here:
def _reflection_with_llm(
self, prompt, messages, llm_agent: Optional[Agent] = None, cache: Optional[AbstractCache] = None
) -> str:
"""Get a chat summary using reflection with an llm client based on the conversation history.
Args:
prompt (str): The prompt (in this method it is used as system prompt) used to get the summary.
messages (list): The messages generated as part of a chat conversation.
llm_agent: the agent with an llm client.
cache (AbstractCache or None): the cache client to be used for this conversation.
"""
system_msg = [
{
"role": "system", -> hardcoded
"content": prompt, -> Programatically defined
}
]
messages = messages + system_msg
if llm_agent and llm_agent.client is not None:
llm_client = llm_agent.client
elif self.client is not None:
llm_client = self.client
else:
raise ValueError("No OpenAIWrapper client is found.")
response = self._generate_oai_reply_from_client(llm_client=llm_client, messages=messages, cache=cache)
print("response: ", response)
return response
It could be as simple as allowing the prompt argument to be an Union[str, dict] that gets deployed accordingly in _reflection_with_llm. Wouldn't affect behavior of current implementations, and allow this minor issue to be solved.
I can do a PR with this if that's ok?
@MarianoMolina perfect. Yes a PR on this option sounds great! It can be added to summary_args
parameter.
@ekzhu https://github.com/microsoft/autogen/pull/2527 Done
Describe the bug
There seems to be an issue with generating the reflection_with_llm summary when working with locally deployed models.
Below is a simple snippet that when run with my local model, it generates the conversation correctly, but the ChatResult summary is empty. When I run it using gpt4, it generates the summary correctly.
Steps to reproduce
from autogen import GroupChatManager, GroupChat, config_list_from_json, ConversableAgent, UserProxyAgent
config_list = config_list_from_json( env_or_file="OAI_CONFIG_LIST", file_location=".", filter_dict={ "model": ["TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf"], }, ) llm_config = { "cache_seed": False, # change the cache_seed for different trials "temperature": 0, "config_list": config_list, "timeout": 120, }
user_proxy_auto = UserProxyAgent( name="user_proxy_auto", code_execution_config=False, llm_config=llm_config )
drafter_agent = ConversableAgent( name="drafter", llm_config=llm_config, system_message="You are an assistant in charge of drafting the answer for the task.", ) reviewer_agent = ConversableAgent( name="reviewer", llm_config=llm_config, system_message="You are an assistant in charge of reviewing the drafted answer and assess its quality in terms of tackling the task successfully and effectively. You can make adjustments directly, request a completely new draft while providing a framework to approach the task more effectively, or approve the answer as is. If the task is complete, end the task with
TERMINATE
", ) group_chat = GroupChat( agents=[drafter_agent, reviewer_agent, user_proxy_auto], messages=[], max_round=4, speaker_selection_method="round_robin" )chat_manager = GroupChatManager( groupchat=group_chat, llm_config=llm_config )
chat_result = user_proxy_auto.initiate_chat( chat_manager, message="List 5 roles/positions that benefit strongly from a high EQ, and list your reasons.", clear_history=True, summary_args={"summary_prompt": "List the final answer to the task."}, summary_method="reflection_with_llm" )
print(f'chat_result history: {chat_result.chat_history}') print(f'chat_result summary: {chat_result.summary}')
Model Used
TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf -> LMStudio gpt-4-turbo-2024-04-09 -> OpenAI
Expected Behavior
reflection_with_llm not generating the output when using local model
Screenshots and logs
No response
Additional Information
Name: pyautogen Version: 0.2.23