Tool function leak on AI answer

agusmoles commented 3 months ago

Description

I dont really have a way for you to reproduce it, but I was testing locally and the streamUI response returned kind of an eval string of python of my tool function definition. Its not too dangerous in the end, but it felt really weird, I think this is not expected.

Image of the UI chat: WhatsApp Image 2024-06-03 at 11 19 19

First and last AI message are tool calls

Code example

src/lib/chat.tsx:

"use server";

import { createAI, getMutableAIState, streamUI } from "ai/rsc";
import { nanoid } from "nanoid";
import Link from "next/link";
import type { ReactNode } from "react";
import { z } from "zod";
import { google } from "@ai-sdk/google";
import { getBanksTNAMetrics, getMetrics } from "./metrics";

export interface ServerMessage {
  role: "user" | "assistant";
  content: string;
}

export interface ClientMessage {
  id: string;
  role: "user" | "assistant";
  display: ReactNode;
}

export async function continueConversation(
  input: string
): Promise<ClientMessage> {
  "use server";
  const history = getMutableAIState();

  const result = await streamUI({
    model: google("models/gemini-1.5-pro-latest"),
    presencePenalty: 0.7, // avoid repetition of already said things in the prompt
    frequencyPenalty: 0.9, // avoid repetition of the same answer
    maxRetries: 0,
    system: "...",
    messages: [...history.get(), { role: "user", content: input }],
    text: ({ content, done }) => {
      if (done) {
        history.done((messages: ServerMessage[]) => [
          ...messages,
          { role: "assistant", content },
        ]);
      }

      return content;
    },
    tools: {
     // ....
      calculateTEA: {
        description: "Calcula la TEA a partir de la TNA",
        parameters: z.object({
          tnaInDecimals: z.number(),
          capitalizationPeriods: z.number(),
        }),
        generate: async ({ tnaInDecimals, capitalizationPeriods }) => {
          // ....

          history.done((messages: ServerMessage[]) => [
            ...messages,
            {
              role: "assistant",
              content: '...',
            },
          ]);

          return (
            <>
              // ...
            </>
          );
        },
      },
    },
  });

  return {
    id: nanoid(),
    role: "assistant",
    display: result.value,
  };
}

export const AI = createAI<ServerMessage[], ClientMessage[]>({
  actions: {
    continueConversation,
  },
  initialAIState: [],
  initialUIState: [],
});

Additional context

No response

jeremyphilemon commented 3 months ago

@agusmoles Thanks for reporting!

This seems like it's an issue with the generation where the model's output is wrong. Were you expecting it to call a tool but instead it printed out the python code to perform the calculation?

agusmoles commented 3 months ago

@jeremyphilemon exactly that, yes

jeremyphilemon commented 3 months ago

@agusmoles Thanks for clarifying!

It seems like it's an issue with the system prompt, schema annotation, or the language model itself, so there's very little we can do as an sdk to prevent this since it's application specific. Any chance you can share more details on the prompt you used that could've led the model to generate a python code instead of calling a tool?

agusmoles commented 3 months ago

@jeremyphilemon yes I know there is too little to do since its pretty difficult to replicate and/or report since the whole context is given from an AI model, so feel free to close if you think there is nothing vercel could do to avoid this kind of "leaks".

Regarding the prompts, the questions you can see in the chat are pretty straightforward, the first asks for an economic TNA (APR in US) which returns a tool call, the second one is a question about the capitalization of the interests earned from the investment (which is given in the system attribute of the model for context) and the third one was intended to also execute another tool function which aims to convert the TNA to a TEA (Effective Annual Rate), but it threw that python code.

It didnt happen before with the same tool or others, it was just that unique time, thats why also the probability of replicating is pretty low but I wanted to report just in case. 😄

Thanks!

lgrammel commented 3 months ago

@agusmoles it's a wild guess, but one thing that could throw off the LLM is the mix of english and french. i often got better results by going all in for a given language. This unfortunately means that you property names, e.g.tnaInDecimals might need to be french, or you could at least add a French description to the variable in the schema. Maybe that helps.

vercel / ai