microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
32.47k stars 4.73k forks source link

[Bug]: reflection_with_llm issue with local server #2492

Closed MarianoMolina closed 6 months ago

MarianoMolina commented 6 months ago

Describe the bug

There seems to be an issue with generating the reflection_with_llm summary when working with locally deployed models.

Below is a simple snippet that when run with my local model, it generates the conversation correctly, but the ChatResult summary is empty. When I run it using gpt4, it generates the summary correctly.

Steps to reproduce

from autogen import GroupChatManager, GroupChat, config_list_from_json, ConversableAgent, UserProxyAgent

config_list = config_list_from_json( env_or_file="OAI_CONFIG_LIST", file_location=".", filter_dict={ "model": ["TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf"], }, ) llm_config = { "cache_seed": False, # change the cache_seed for different trials "temperature": 0, "config_list": config_list, "timeout": 120, }

user_proxy_auto = UserProxyAgent( name="user_proxy_auto", code_execution_config=False, llm_config=llm_config )

drafter_agent = ConversableAgent( name="drafter", llm_config=llm_config, system_message="You are an assistant in charge of drafting the answer for the task.", ) reviewer_agent = ConversableAgent( name="reviewer", llm_config=llm_config, system_message="You are an assistant in charge of reviewing the drafted answer and assess its quality in terms of tackling the task successfully and effectively. You can make adjustments directly, request a completely new draft while providing a framework to approach the task more effectively, or approve the answer as is. If the task is complete, end the task with TERMINATE", ) group_chat = GroupChat( agents=[drafter_agent, reviewer_agent, user_proxy_auto], messages=[], max_round=4, speaker_selection_method="round_robin" )

chat_manager = GroupChatManager( groupchat=group_chat, llm_config=llm_config )

chat_result = user_proxy_auto.initiate_chat( chat_manager, message="List 5 roles/positions that benefit strongly from a high EQ, and list your reasons.", clear_history=True, summary_args={"summary_prompt": "List the final answer to the task."}, summary_method="reflection_with_llm" )

print(f'chat_result history: {chat_result.chat_history}') print(f'chat_result summary: {chat_result.summary}')

Model Used

TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf -> LMStudio gpt-4-turbo-2024-04-09 -> OpenAI

Expected Behavior

reflection_with_llm not generating the output when using local model

Screenshots and logs

No response

Additional Information

Name: pyautogen Version: 0.2.23

MarianoMolina commented 6 months ago

Did a run and added some comments to ConversableAgent to try to track down the issue and I still can't figure out what it is:

List 5 roles/positions that benefit strongly from a high EQ, and list your reasons.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
1. generate_oai_reply for drafter to chat_manager. config: None
2. _generate_oai_reply_from_client for drafter. llm_client: [{'cache_seed': False, 'temperature': 0.5, 'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'}], cache: None
2.b. _generate_oai_reply_from_client messages: [{'content': '[SYSTEM MESSAGE]"', 'role': 'system'}, {'content': '[PROMPT]', 'name': 'user_proxy_auto', 'role': 'user'}]
3. _generate_oai_reply_from_client response: ChatCompletion(id='chatcmpl-9jjt8tzao65chsce8vo6n', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="[REPLY 1]", role='assistant', function_call=None, tool_calls=None))], created=1713971689, model='TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=340, prompt_tokens=340, total_tokens=680), cost=0, message_retrieval_function=<bound method OpenAIClient.message_retrieval of <autogen.oai.client.OpenAIClient object at 0x00000253849CA350>>, config_id=0, pass_filter=True)
drafter (to chat_manager):
[REPLY 1]

--------------------------------------------------------------------------------
1. Calling reflection_with_llm_as_summary from user_proxy_auto to chat_manager
1.b. llm_config from sender: {'cache_seed': False, 'temperature': 0.5, 'config_list': [{'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', 'base_url': 'http://localhost:1234/v1', 'api_key': 'lm-studio'}], 'timeout': 120} and from recipient: {'cache_seed': False, 'temperature': 0.5, 'config_list': [{'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', 'base_url': 'http://localhost:1234/v1', 'api_key': 'lm-studio'}], 'timeout': 120}
2. _generate_oai_reply_from_client for user_proxy_auto. llm_client: [{'cache_seed': False, 'temperature': 0.5, 'model': 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf'}], cache: None
2.b. _generate_oai_reply_from_client messages: [{'content': '[PROMPT]', 'role': 'user', 'name': 'user_proxy_auto'}, {'content': "[REPLY 1]", 'role': 'user', 'name': 'drafter'}, {'role': 'system', 'content': 'List the final answer to the task.'}]
3. _generate_oai_reply_from_client response: ChatCompletion(id='chatcmpl-iv8jxgvqcydrcavkk0yrm', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='', role='assistant', function_call=None, tool_calls=None))], created=1713971727, model='TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=0, prompt_tokens=1, total_tokens=1), cost=0, message_retrieval_function=<bound method OpenAIClient.message_retrieval of <autogen.oai.client.OpenAIClient object at 0x00000253849F1780>>, config_id=0, pass_filter=True)
response:
chat_result history: [{'content': '[PROMPT]', 'role': 'assistant'}, {'content': "[REPLY 1]", 'name': 'drafter', 'role': 'user'}]
chat_result summary:

From what I see, despite the flows (reflection_with_llm and normal generate_reply) being distinct, I cannot see why _generate_oai_reply_from_client's response is empty with reflection_with_llm.

There's clearly an issue since the reflection flow _generate_oai_reply_from_client response object shows prompt tokens as 1, but the messages object being passed seems correct.

ekzhu commented 6 months ago

Have you checked the logs on your local model server?

MarianoMolina commented 6 months ago

[2024-04-24 22:54:11.733] [INFO] Received POST request to /v1/chat/completions with body: { "messages": [ { "content": "[SYSTEM MESSAGE].", "role": "system" }, { "content": "[PROMPT]", "role": "user", "name": "user_proxy_auto" }, { "content": "[RESPONSE 1]", "role": "user", "name": "hr_expert_drafter" }, { "role": "system", "content": "[PROMPT 2]" } ], "model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf", "stream": false } [2024-04-24 22:54:11.733] [INFO] [LM STUDIO SERVER] Context Overflow Policy is: Truncate Middle [2024-04-24 22:54:11.735] [INFO] [LM STUDIO SERVER] Last message: { role: 'system', content: '[PROMPT 2]' } (total messages = 4) [2024-04-24 22:54:12.500] [INFO] [LM STUDIO SERVER] [TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf] Generated prediction: { "id": "chatcmpl-t6yiyywzta91zv0ozsr9r", "object": "chat.completion", "created": 1714010051, "model": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q6_K.gguf", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 1, "completion_tokens": 0, "total_tokens": 1 } }

MarianoMolina commented 6 months ago

Ok, I figured it out. Seems Mistral Instruct returns an empty completion when there's a second "system" message in the middle of the flow. Seems to be an issue with compatibility, which is something I should be aware of when using local models with Autogen, as stated in many places. After repeating the process with this change, the response now seems correct. With this in mind, it might be a good idea to allow the role of the reflection_with_llm prompt to be defined, since right now its hardcoded here:

    def _reflection_with_llm(
        self, prompt, messages, llm_agent: Optional[Agent] = None, cache: Optional[AbstractCache] = None
    ) -> str:
        """Get a chat summary using reflection with an llm client based on the conversation history.

        Args:
            prompt (str): The prompt (in this method it is used as system prompt) used to get the summary.
            messages (list): The messages generated as part of a chat conversation.
            llm_agent: the agent with an llm client.
            cache (AbstractCache or None): the cache client to be used for this conversation.
        """
        system_msg = [
            {
                "role": "system", -> hardcoded
                "content": prompt, -> Programatically defined
            }
        ]

        messages = messages + system_msg
        if llm_agent and llm_agent.client is not None:
            llm_client = llm_agent.client
        elif self.client is not None:
            llm_client = self.client
        else:
            raise ValueError("No OpenAIWrapper client is found.")
        response = self._generate_oai_reply_from_client(llm_client=llm_client, messages=messages, cache=cache)
        print("response: ", response)
        return response

It could be as simple as allowing the prompt argument to be an Union[str, dict] that gets deployed accordingly in _reflection_with_llm. Wouldn't affect behavior of current implementations, and allow this minor issue to be solved.

I can do a PR with this if that's ok?

ekzhu commented 6 months ago

@MarianoMolina perfect. Yes a PR on this option sounds great! It can be added to summary_args parameter.

MarianoMolina commented 6 months ago

@ekzhu https://github.com/microsoft/autogen/pull/2527 Done