RunOrVeith commented 11 months ago

This issue requires a bit of a lengthy explanation, but the overall problem is:

TLDR:

The types of StructuredTool, AgentAction, parsing error handling in AgentExecutor and StructuredChatOutputParser don't fit together, and it only typechecks kind of by accident at the moment. See also the summary at the bottom.

Explanation

I am going to explain the issue with an example. Let's assume we have a StructuredTool that can click on an element on a web page:

import { z } from "zod";
import { StructuredTool } from "langchain/tools";

const clickSchema = z.object({
  selector: z.string().describe("The query seletor to click on."),
});

const ClickSchema = z.infer<typeof clickSchema>;

class ClickTool extends StructuredTool<typeof clickSchema> {
  schema = clickSchema;
  name = "click";
  description =
    "left click on an element on a web page represented by a query selector";

  protected _call(arg: ClickSchema): Promise<string> {
    // We need to know that arg.selector is a thing here
    return this.click(arg.selector);
  }

  private click(selector: string): Promise<string> {

    return Promise.resolve(`Clicked on ${selector}`);
  }
}

When you look at the signature and implementation of StructuredTool.call, it seems like we already know what the input is, but in reality, the validation only happens inside of that function:

async call(
    arg: (z.output<T> extends string ? string : never) | z.input<T>,
    configArg?: Callbacks | RunnableConfig,
    /** @deprecated */
    tags?: string[]
  ): Promise<string> {
    let parsed;
    try {
      parsed = await this.schema.parseAsync(arg); // Only now we know that arg is z.input<T>!
    } catch (e) {
      throw new ToolInputParsingException(
        `Received tool input did not match expected schema`,
        JSON.stringify(arg)
      );
    }
    // ... more code

In our case, our schema can not be string, so this simplifies to the type { selector: string }. The signature reads like we have already made some assumptions about the input, where in reality we are only at a signature that looks like this:

call: async (arg: (z.output<T> extends string ? string : never) | Json, /*...*/) => Promise<String>;

But even that has more issues:

Something must have already decided that the string returned by the model is valid JSON and have parsed it.
In case that z.output<T> extends string, something somewhere must have already decided that string is an acceptable input format for the tool, and we do not need to parse JSON. (A string by itself is not valid JSON). This actually does not happen anywhere, so the case z.output<T> extends string can never be true. Also, z.string() is not a child of z.ZodObject, which is required by the generic in StructuredTool.

This is where the OutputParser comes in. The part that we really care about is the parse method:

This is where the string that the model produces is parsed to JSON, and errors are thrown if it is not valid JSON. A non-json string will throw a syntax error if passed into JSON.parse.

 /**
   * Parses the given text and returns an `AgentAction` or `AgentFinish`
   * object. If an `OutputFixingParser` is provided, it is used for parsing;
   * otherwise, the base parser is used.
   * @param text The text to parse.
   * @param callbacks Optional callbacks for asynchronous operations.
   * @returns A Promise that resolves to an `AgentAction` or `AgentFinish` object.
   */
  async parse(text: string): Promise<AgentAction | AgentFinish> {
    try {
      const regex = /```(?:json)?(.*)(```)/gs;
      const actionMatch = regex.exec(text);
      if (actionMatch === null) {
        throw new OutputParserException(
          `Could not parse an action. The agent action must be within a markdown code block, and "action" must be a provided tool or "Final Answer"`
        );
      }
      const response = JSON.parse(actionMatch[1].trim());
      const { action, action_input } = response;

      if (action === "Final Answer") {
        return { returnValues: { output: action_input }, log: text };
      }
      return { tool: action, toolInput: action_input || {}, log: text };
    } catch (e) {
      throw new OutputParserException(
        `Failed to parse. Text: "${text}". Error: ${e}`
      );
    }
  }

From parsing, we get an AgentAction (we can ignore AgentFinish for now) that looks like this:

export type AgentAction = {
    tool: string;
    toolInput: string;
    log: string;
};

toolInput is typed as string, even though we just used JSON.parse to get it! Consider the positive case where the model produced output that matches our schema, let's say the string "{\"selector\": \"myCoolButton\"}". Using JSON.parse, this will deserialize to an object { selector: "myCoolButton" }, and not a string. But because JSON.parse's return type is any, the typescript compiler has no chance of realizing this.

To understand why this is troublesome, we need to look into the execution loop where the AgentActions are used to actually invoke the tool. This happens here in AgentExecutor._call. I've split the relevant parts of the method into these two smaller methods and simplied a bit to show my point:

  // originally part of AgentExectutor._call
  async safePlanNextStep(previousSteps: AgentStep[]): Promise<AgentAction> {
    try {
      // This will return either an AgentAction or AgentFinish through outputParser.parse
      output = await this.agent.plan(steps, inputs);
    } catch (e) {
      if (e instanceof OutputParserException) {
        let observation;
        const text = e.message;
        if (this.handleParsingErrors === true) {
          observation = "Invalid or incomplete response";
        } else if (typeof this.handleParsingErrors === "string") {
          observation = this.handleParsingErrors;
        } else if (typeof this.handleParsingErrors === "function") {
          observation = this.handleParsingErrors(e);
        } else {
          throw e;
        }
        output = {
          tool: "_Exception",
          toolInput: observation,
          log: text,
        } as AgentAction;
      } else {
        throw e;
      }
    }
  }

This is where the parsing using the OutputParser comes in, and where its exceptions are handled. You can see that in the case of an error, the toolInput field will always be a string (if this.handleParsingErrors is a function, the return type is also string). But we have just seen above, that in the non-error case toolInput will be parsed JSON! This is inconsistent behavior, we never parse the output of handleParsingErrors to JSON, so we are now in a state where toolInput is sometimes a string, and sometimes parsed json.

The next step is to call the selected tool with the given input:

// Also part of AgentExecutor._call in the real code
async safeCallTool(action: AgentAction): Promise<AgentStep> {
  const tool =
    action.tool === "_Exception"
      ? new ExceptionTool()
      : toolsByName[action.tool?.toLowerCase()];
  let observation;
  try {
    observation = tool
      ? await tool.call(action.toolInput)
      : `${action.tool} is not a valid tool, try another one.`;
  } catch (e) {
    if (e instanceof ToolInputParsingException) {
      if (this.handleParsingErrors === true) {
        observation = "Invalid or incomplete tool input. Please try again.";
      } else if (typeof this.handleParsingErrors === "string") {
        observation = this.handleParsingErrors;
      } else if (typeof this.handleParsingErrors === "function") {
        observation = this.handleParsingErrors(e);
      } else {
        throw e;
      }
      observation = await new ExceptionTool().call(
        observation,
        runManager?.getChild()
      );
      return { action, observation: observation ?? "" };
    }
  }

  return { action, observation: observation ?? "" };
}

We only pass the previously computed output on to the tool in tool.call(action.toolInput)! We do not actually have any guarantees for the input types to the tool! And in case this causes another error, we re-use the same function to handle parsing errors that will return a string that is supposed to be the tool output in the error case.

Summary

We parse the model's output to JSON and use that parsed result to call a tool
If the parsing succeeds, we call the tool with any valid JSON
If the parsing fails, we call the tool with a string
The tool parses the input with zod, which will only work in the error case if the schema is just a const stringSchema = z.string()
using const stringSchema = z.string() as the tool schema will not type check at all, since the generic argument of StructuredTool is T extends z.ZodObject<any, any, any, any>, and typeof stringSchema does not fulfill that constraint
The signature of tool.call allows this to type check, since we don't know specifically which tool we have at the moment, so string and any json is potentially valid
The actual type check for this happens at runtime inside this function that has a signature assuming we already have the parsed data
The developer implementing the tool has no idea about this. Since only StrucStep.actionturedTool._call is abstract, you will always get what the schema indicates, but StructuredTool.call will fail, even if you have supplied a function handleParsingErrors.
Whatever the tool gets called with is serialized into AgentAction.toolInput: string, which is not correctly typed, it is actually either string or JSON at the moment
The library user has access to the AgentStepss with wrongly typed AgentActions, since it is possible to request them as a return value of the overall loop using returnIntermediateSteps=true. Whatever the user now does with the AgentSteps is not type safe.

When is that an actual issue?

We noticed that the model often times makes the same errors with the tool input format. Recall our ClickSchema, which is just { selector: string }. In our clicking example it would either generate according to the schema, or { element: string }, or just a string which was the value we want, like "myCoolButton".

So we built and auto-fixer for these common error cases. The fixer basically just checks whether it can fix the input using either of the options above. The earliest we can inject this code without overwriting a lot of the planning logic that LangChain provides is in StructuredTool.call. We can not handle it using handleParsingErrors, since that receives only the error as input, and not the causing text. Once you are overwriting StructuredTool.call, you are relying on the signature of that function to be correct, which we just saw is not the case. It would also be great if the corrected tool input could be serialized in the intermediate steps, which we can only do through some hacks at the moment, because the steps are not part of the error handling process. Separate issue for this: #2711 . At this point, you are stuck having to figure out all of the above to see why you are getting wrongly typed inputs to call and in the resulting intermediateSteps.

Improvement ideas

Unfortunately, anything that really fixes this is a breaking change. Nonetheless, this is what I would propose:

Change the signature of StructuredTool.call, so that the option of this being a string at all is gone and we only get json:
```
call: async (arg: Json, /*...*/) => Promise<String>;
```
and move the parsing of the input inside StructuredTool.call into its own method that can be overwritten in specific implementations of StructuredTool
Either change the type of AgentAction.toolInput to json (probably problematic with the non-structured agents, I have not looked into those. Could be solved by making the toolInput type generic), or keep it as string but use explicit JSON.stringify when creating an AgentAction.
The handleParsingErrors type needs to match the type of toolInput. I want to be able to provide parsed JSON as a solution to a parsing error, so either add a JSON.parse around that or change the type of the callback and string case for that argument to Json.
Extract the methods for error handling similar to above from AgentExecutor._call into their own methods for easier customization and improved readability ( see also my other issue #2711 for some improvement ideas here).

I would be willing to contribute here if we can find a good solution. I have not looked into the python code for this, but assume it has the same problem.

dosubot[bot] commented 11 months ago

🤖

Thank you for your detailed analysis and suggestions on improving type safety and consistency in the LangChainJS framework. I agree with your points and here's how we can address them:

StructuredTool class: We can modify the call function to only accept JSON as input. This will ensure that the input is always valid JSON and will eliminate the need for parsing inside the call function. Here's how you can do it:

async call(
  arg: z.input<T>,
  configArg?: Callbacks | RunnableConfig,
  /** @deprecated */
  tags?: string[]
): Promise<string> {
  let parsed;
  try {
    parsed = await this.schema.parseAsync(arg);
  } catch (e) {
    this.handleError(e);
  }

  // ...
}

AgentAction class: We can change the type of toolInput field to JSON. This will ensure that the toolInput field is always valid JSON and will eliminate the need for parsing it to JSON. Here's how you can do it:

class AgentAction {
  toolInput: z.input<T>;

  // ...
}

AgentActionExecutor class: We can extract the methods for error handling from AgentExecutor._call into their own methods. This will make the error handling more modular and easier to manage. Here's how you can do it:

class AgentActionExecutor {
  // ...

  private handleParsingErrors(e: Error): void {
    // Handle parsing errors here
  }

  private _call(action: AgentAction): void {
    try {
      // ...
    } catch (e) {
      this.handleParsingErrors(e);
    }
  }
}

StructuredChatOutputParser class: We can modify the parse function to only return valid JSON. This will ensure that the output is always valid JSON and will eliminate the need for parsing it to JSON. Here's how you can do it:

class StructuredChatOutputParser {
  // ...

  parse(output: string): z.input<T> {
    // Parse the output to JSON here
  }
}

Please note that these changes will require updating all places where these classes and their methods are used. Also, these changes will make the code more type safe, but they may not cover all edge cases. Therefore, thorough testing is recommended after implementing these changes.

I hope this helps! If you have any further questions or need more clarification, please let me know.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

RunOrVeith commented 11 months ago

@dosu-bot Your recommendations are exactly what I already proposed. You could help by refining them to not be a breaking change.

dosubot[bot] commented 11 months ago

🤖

Thank you for your detailed proposal. Here are my suggestions on how to implement these changes without causing breaking changes:

Modify the call function in StructuredTool class to only accept JSON as input. Instead of directly modifying the call function to only accept JSON, you could create a new method, say callJSON, which accepts only JSON. This method could internally call the existing call method after converting the JSON to the required type. This way, the existing functionality will not be affected.
```
async callJSON(
 arg: JSON,
 configArg?: Callbacks | RunnableConfig,
 tags?: string[]
): Promise<string> {
 const convertedArg = convertJSONToRequiredType(arg); // Implement this function
 return this.call(convertedArg, configArg, tags);
}
```
Change the type of toolInput field in AgentAction class to JSON. Similar to the above suggestion, instead of directly changing the type of toolInput, you could add a new field, say toolInputJSON, which is of type JSON. You could then modify the code to populate both toolInput and toolInputJSON. This way, the existing code which relies on toolInput will not be affected.
```
const toolInputJSON = function_call.arguments
 ? JSON.parse(function_call.arguments)
 : {};
const toolInput = convertJSONToRequiredType(toolInputJSON); // Implement this function
```
Extract the methods for error handling from AgentExecutor._call into their own methods. This change should not cause any breaking changes as long as the extracted methods are called in the same order and with the same arguments as in the original _call method. Make sure to thoroughly test the new implementation to ensure that it behaves exactly as the old one.
Modify the parse function in StructuredChatOutputParser class to only return valid JSON. Instead of modifying the existing parse function, you could create a new method, say parseJSON, which only returns valid JSON. This method could internally call the existing parse method and then convert the result to JSON. This way, the existing functionality will not be affected.
```
parseJSON(output: string): JSON {
 const result = this.parse(output);
 return convertToJSON(result); // Implement this function
}
```

Please note that these are just suggestions and might need to be adapted to your specific use case. Also, remember to thoroughly test all changes to ensure that they do not introduce any new bugs.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

david1542 commented 9 months ago

I'm also experiencing this issue. Any news on that?

dosubot[bot] commented 6 months ago

Hi, @RunOrVeith,

I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, the issue you raised highlighted type inconsistency and parsing errors in the LangChainJS framework. Proposed improvements included modifying the call function in StructuredTool to only accept JSON, changing the type of toolInput field in AgentAction to JSON, and extracting error handling methods for better modularity. Dosubot provided detailed suggestions on implementing these changes without causing breaking changes, and another user, david1542, also expressed experiencing the same issue.

Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!

langchain-ai / langchainjs

Typing of tool input in AgentAction is broken for StructuredTool, input error handling and StructuredChatOutputParser #2710

TLDR:

Explanation

Summary

When is that an actual issue?

Improvement ideas

Sources

Sources