lastmile-ai / mcp-agent

Build effective agents using Model Context Protocol and simple workflow patterns
Apache License 2.0
1.93k stars 162 forks source link

Change approach to generate_structured #61

Open Nicba1010 opened 1 week ago

Nicba1010 commented 1 week ago

The approach to implementing generate_structured seems really problematic. It has severe issues with losing the detail from the generate_str call.

When calling generate_str: Image Same query with a simple wrapper for generate_structured: Image

Current implementation:

    async def generate_structured(
        self,
        message,
        response_model: Type[ModelT],
        request_params: RequestParams | None = None,
    ) -> ModelT:
        # First we invoke the LLM to generate a string response
        # We need to do this in a two-step process because Instructor doesn't
        # know how to invoke MCP tools via call_tool, so we'll handle all the
        # processing first and then pass the final response through Instructor
        response = await self.generate_str(
            message=message,
            request_params=request_params,
        )

        # Next we pass the text through instructor to extract structured data
        client = instructor.from_openai(
            OpenAI(api_key=self.context.config.openai.api_key, base_url=self.context.config.openai.base_url),
            mode=instructor.Mode.TOOLS_STRICT,
        )

        params = self.get_request_params(request_params)
        model = await self.select_model(params)

        # Extract structured data from natural language
        structured_response = client.chat.completions.create(
            model=model or "gpt-4o",
            response_model=response_model,
            messages=[
                {"role": "user", "content": response},
            ],
        )

        return structured_response
Nicba1010 commented 1 week ago

And this is fairly consistent, to get worse responses through generate_structured

MattMorgis commented 1 week ago

Thanks for raising this @Nicba1010. I have also been meaning to suggest updating these.

I think the instructor library is pretty standard, but both OpenAI and Anthropic have native solutions for these that could potentially be used instead:

saqadri commented 1 week ago

Thanks for raising this @Nicba1010. I have also been meaning to suggest updating these.

I think the instructor library is pretty standard, but both OpenAI and Anthropic have native solutions for these that could potentially be used instead:

I believe @evalstate has suggested the same thing. This is a good change to make. Any takers? :). Otherwise I'll take it next week.

evalstate commented 1 week ago

Yep, I've seen the same with Instructor, it's doubling up calls and generating results inconsistent with the other call. For the places where it is used I don't think the complexity warrants it. @saqadri in the fast-agent branch i've also moved to XML tags for those areas. Also of note as it's a related area, Sonnet 3.7 over-plans and over-evaluates - so model specific autogenerated prompts/rules may be necessary for a good experience....

evalstate commented 1 week ago

btw, i've already removed it for anthropic (based on the current outputs, i'm not worried about using the 'formal' structured outputs just yet) and can remove for OAI this morning. this also allows a bump up of the anthropic sdk which instructor has forced pinned to an earlier verison.

evalstate commented 1 week ago

OK, this is now merged to main in fast-agent and pushed in 0.1.8. quick summary:

The performance increase is..... yeah, massive. You definitely want this. Actual performance of the orchestrator in both modes also seems better.

Caveats/future TODOs:

Other things in mind:

saqadri commented 1 week ago

OK, this is now merged to main in fast-agent and pushed in 0.1.8. quick summary:

  • Instructor and dependencies removed; packages updated.
  • OpenAI endpoint now uses the chat completions parse method to generate structured content.
  • Anthropic API uses Pydantic with partial JSON parsing.
  • Orchestrator, Eval/Opt and Router now all use these new methods natively.

The performance increase is..... yeah, massive. You definitely want this. Actual performance of the orchestrator in both modes also seems better.

Caveats/future TODOs:

  • Supply structured schema as tool definition (and specify tool) to Anthropic endpoint. I haven't done this: the prompting seems to be more than adequate for both Haiku and Sonnet and the generated prompts require very simple JSON.
  • Prompt Engineering for the autogenerated Router/Orchestrator/Eval-Optimizer prompts.
  • Have the prompted JSON format be generated using the Pydantic type rather than hard-coded.
  • Improve the Console Display of Workflow Agents (1) -> use the types to display clearly what is happening rather than leave it to the User to peer through XML/JSON.
  • Improve the Console Display of Workflow Agents (2) -> Truncate concatenated agent results, apply minor indentation to XML to make it easier to eyeball the overall content.
  • I've tested pretty extensively, but there's always edge cases. There was a bit of LLM generated code that was overly defensive that I've taken out etc.

Other things in mind:

  • Simplify the type hierarchy for Agents/AugmentedLLMs. These workflow components are just Agents really, and e.g. router_llm doesn't derive from Augmented, Orchestrator does.
  • For fast-agent, I am moving towards using MCP Types internally for the interfaces, and will be revisiting the use of Structured Outputs at the User-facing API level.

@evalstate would it be possible to create a targeted PR for that specifically into mcp_agent? I can also do it. Right now my concern is there is so much change in fastagent that it is making it hard to merge a lot of really great stuff in.