vercel / ai

Build AI-powered applications with React, Svelte, Vue, and Solid
https://sdk.vercel.ai/docs
Other
10.23k stars 1.54k forks source link

streamUI overwrites rather than appends when text-deltas change to tool-call-deltas in same stream. #1976

Open shaded-blue opened 5 months ago

shaded-blue commented 5 months ago

Description

I believe this could be expected behavior and simply a limitation? If so, please feel free to close this out (and please correct my understanding).

Reproduction is rather straightforward. Here's what I've been able to produce with consistently: 1) Model set to gpt-4-turbo or gpt-4o 2) Model provided a tool that produces nonsensical results (i.e. checkWeather, which accepts a location, but returns {prop1: 'Hello', prop2: 'World!'}) or something like that.

Now we need the model to call that nonsensical or broken function, and then prompt it to describe / explain the results of the function.

1) Prompt: "Please call checkWeather for Miami" 2) Model will call the tool, yielding the load state, and then returning the garbage data. 3) Prompt: "What was the response" | "How's it look" | etc. 4) Model will begin streaming a preface explaining that something seemed to have gone wrong, and that it will try to call the tool again. 5) Model will indeed call the tool again, in the same stream.

handleRender seems to be correctly detecting the new format, and runs the tool generator, however it does not seem to account for the possibility of having previous text-deltas, which (in my opinion / from my understanding) should use .append() to accomodate this transition by completing the text case and moving to the generator, rather than overwriting it.

Code example

No response

Additional context

Again there's a good chance I'm simply misunderstanding something about the core and this is currently a limitation or not how things are intended to be used.

I haven't spent a great deal of time looking into potential solutions for the reasons above (no point in spending the time if I'm misunderstanding or this is a known limitation with a planned fix), but I figured I would point out that this is certainly a possibility and one that users would probably find to feel like a bug.

Perhaps it's a straightforward as something like handling the transition case in the handleRender() conditional loop that is successfully handling the transition and overwrite in the first place to, as I said, .append() to the streamableUI rather than just having .update()'s => .done()? Perhaps it would be best to wait for the onFinish callback to facilitate sealing the preface before moving to the tool call deltas and generator?

Any thoughts or information are greatly appreciated! Apologies if I'm glossed over something and filed an uninformed report.

ewantindale commented 4 months ago

Dealing with this issue as well when using Claude 3.5 Sonnet.

It starts off with a text response, then decides to call a function, and overwrites the text response with the function result. I would much rather have both the initial text response and the function result.

ayepRahman commented 4 months ago

I am facing the same issue here, using Claude 3.5 Sonnet. I can see my text is streaming, when it decides to call the tool, it overwrites my AssistantMessage. From a user perspective, seems more like a bug than a desirable outcome having to see stream text rather than rerender with custom UI. It would be great if we could have a fine-grain control to chooses between overwriting vs appending those messages.

here a video recorded session - https://www.loom.com/share/e285e0aa400e47d4ab52a891b0f74fe1

"use server";

import { auth } from "@/auth";
import { AdventureMessage, AssistantMessage } from "@/components/Message";
import { model } from "@/lib/ai";
import type { ClientMessage } from "@/schemas/ClientMessage";
import type { SerperPlace } from "@/schemas/SerperPlace";
import { generateId, type CoreMessage } from "ai";
import { createAI, getAIState, getMutableAIState, streamUI } from "ai/rsc";
import { z } from "zod";
import { getSerperPlacesAction } from "./serper";

export const sleep = (ms: number) =>
  new Promise((resolve) => setTimeout(resolve, ms));

export type ServerMessage =
  | {
      id?: string;
      role: "user";
      content: string;
    }
  | {
      id?: string;
      role: "assistant";
      content: string;
    }
  | {
      id?: string;
      role: "system";
      content: string;
    }
  | {
      id?: string;
      name?: "getAdventure";
      role: "tool";
      content: {
        data: SerperPlace[];
      };
    };

export type AIState = ServerMessage[];
export type UIState = ClientMessage[];

const content = `
  You are a specialized adventure-seeking travel agent, dedicated to helping users discover and plan thrilling adventures. 
`;

export async function sendMessage({
  message,
  threadId,
}: {
  message: string;
  threadId: string;
}): Promise<ClientMessage> {
  const session = await auth();
  if (!session) throw new Error("No session found");

  const history = getMutableAIState<typeof AI>();

  const result = await streamUI({
    model,
    messages: [
      ...history.get(),
      { id: generateId(), role: "user", content: message },
      { role: "system", content, toolInvocations: [] },
    ] as CoreMessage[],
    initial: <AssistantMessage isLoading />,
    text: ({ content, done }) => {
      console.log({
        content,
        done,
      });
      if (done) {
        history.done([
          ...history.get(),
          { id: generateId(), role: "assistant", content },
        ]);
      }
      return <AssistantMessage message={content} />;
    },
    tools: {
      getAdventure: {
        description: `
          Get the recommended or top adventure places only base on user search query and location

          country: string
          countryCode: string
          type: string
          query: string
          `,
        parameters: z.object({
          country: z.string().describe("The country name, e.g., United States"),
          countryCode: z.string().describe("The country code, e.g., US"),
          category: z
            .string()
            .describe(
              "The category of the adventure, e.g., hiking, biking, camping",
            ),
          query: z
            .string()
            .describe(
              "The query to search for, e.g., top 10 campsites in San Francisco, United States",
            ),
        }),
        generate: async function* ({ country, countryCode, category, query }) {
          console.log({ country, countryCode, category, query });

          yield <AdventureMessage isLoading />;

          const places = await getSerperPlacesAction({
            query,
            countryCode,
          });

          history.update([
            ...history.get(),
            {
              id: generateId(),
              role: "tool",
              name: "getAdventure",
              content: { data: places },
            },
          ]);
          return <AdventureMessage data={places} />;
        },
      },
    },
    onFinish(result) {
      console.log(">>>>", result);
    },
  });

  // Finalize the history with all messages in the correct sequence
  history.done([...history.get()]);

  console.log(">>>>", history.get());

  console.log("result", result);

  return {
    id: generateId(),
    role: "assistant",
    display: result.value,
  };
}

// Create the AI provider with the initial states and allowed actions
export const AI = createAI({
  initialAIState: [] as AIState,
  initialUIState: [] as UIState,
  actions: { sendMessage },
  onSetAIState: async ({ state, done }) => {
    "use server";

    console.log("onSetAIState: state", state);
  },
  onGetUIState: async () => {
    "use server";

    const history = await getAIState();

    console.log("onGetUIState: history", history);

    return history;
  },
});
ayepRahman commented 4 months ago

Description

I believe this could be expected behavior and simply a limitation? If so, please feel free to close this out (and please correct my understanding).

Reproduction is rather straightforward. Here's what I've been able to produce with consistently:

  1. Model set to gpt-4-turbo or gpt-4o
  2. Model provided a tool that produces nonsensical results (i.e. checkWeather, which accepts a location, but returns {prop1: 'Hello', prop2: 'World!'}) or something like that.

Now we need the model to call that nonsensical or broken function, and then prompt it to describe / explain the results of the function.

  1. Prompt: "Please call checkWeather for Miami"
  2. Model will call the tool, yielding the load state, and then returning the garbage data.
  3. Prompt: "What was the response" | "How's it look" | etc.
  4. Model will begin streaming a preface explaining that something seemed to have gone wrong, and that it will try to call the tool again.
  5. Model will indeed call the tool again, in the same stream.

handleRender seems to be correctly detecting the new format, and runs the tool generator, however it does not seem to account for the possibility of having previous text-deltas, which (in my opinion / from my understanding) should use .append() to accomodate this transition by completing the text case and moving to the generator, rather than overwriting it.

Code example

No response

Additional context

Again there's a good chance I'm simply misunderstanding something about the core and this is currently a limitation or not how things are intended to be used.

I haven't spent a great deal of time looking into potential solutions for the reasons above (no point in spending the time if I'm misunderstanding or this is a known limitation with a planned fix), but I figured I would point out that this is certainly a possibility and one that users would probably find to feel like a bug.

Perhaps it's a straightforward as something like handling the transition case in the handleRender() conditional loop that is successfully handling the transition and overwrite in the first place to, as I said, .append() to the streamableUI rather than just having .update()'s => .done()? Perhaps it would be best to wait for the onFinish callback to facilitate sealing the preface before moving to the tool call deltas and generator?

Any thoughts or information are greatly appreciated! Apologies if I'm glossed over something and filed an uninformed report.

did u manage to resolve this issue?

rcolepeterson commented 3 months ago

Same issue. text begins to fire and then tool method blows it all away with it's response.

rcolepeterson commented 3 months ago

`export async function continueConversation( input: string ): Promise { "use server";

const history = getMutableAIState(); let accumulatedContent = "";

const result = await streamUI({ model: google("models/gemini-1.5-flash-latest"), messages: [...history.get(), { role: "user", content: input }],

text: ({ content, done }) => {
  accumulatedContent = content;
  if (done) {
    history.done((messages: ServerMessage[]) => [
      ...messages,
      { role: "assistant", content: content },
    ]);
  }
  return (
    <div>
      <ReactMarkdown>{content}</ReactMarkdown>
    </div>
  );
},
tools: {
  show_art: {
    description: "Get the painting being discussed",
    parameters: z.object({
      paintingName: z.string().describe("The name of the painting to show"),
      reason: z
        .string()
        .optional()
        .describe("The reason for showing these paintings"),
    }),
    generate: async function* ({ paintingName, reason }) {
      try {
        yield <div>Showing you the painting:</div>;

        return (
          <div>
            <p>{accumulatedContent}</p>
            <Art paintingName={paintingName} reason={reason ?? ""} />
          </div>
        );
      } catch (error) {
        console.error("Error fetching products:", error);
        yield (
          <div>
            There was an error fetching the products. Please try again
            later.
          </div>
        );
      }
    },
  },
},

});`

this seems to work. Keeping track of text content and then displaying it when the ui accumulatedContent = content;

KeisukeNagakawa commented 2 months ago

@rcolepeterson I checked this code and found out done is not called, so you can remove if(done) clause like this:

export async function continueConversation(
input: string
): Promise {
"use server";

const history = getMutableAIState();
let accumulatedContent = "";

const result = await streamUI({
model: google("models/gemini-1.5-flash-latest"),
messages: [...history.get(), { role: "user", content: input }],

text: ({ content, done }) => {
  accumulatedContent = content;
  // remove if (done) 
  }
  return (
    <div>
      <ReactMarkdown>{content}</ReactMarkdown>
    </div>
  );
},
tools: {
  show_art: {
    description: "Get the painting being discussed",
    parameters: z.object({
      paintingName: z.string().describe("The name of the painting to show"),
      reason: z
        .string()
        .optional()
        .describe("The reason for showing these paintings"),
    }),
    generate: async function* ({ paintingName, reason }) {
      try {
        yield <div>Showing you the painting:</div>;

        return (
          <div>
            <p>{accumulatedContent}</p>
            <Art paintingName={paintingName} reason={reason ?? ""} />
          </div>
        );
      } catch (error) {
        console.error("Error fetching products:", error);
        yield (
          <div>
            There was an error fetching the products. Please try again
            later.
          </div>
        );
      }
    },
  },
},
});
polesapart commented 2 months ago

@rcolepeterson I checked this code and found out done is not called, so you can remove if(done) clause like this:

When the conversation "ends" with the tool called, it doesn't call done. Otherwise, it does. This seems weird. Also, streamUI doesn't seem to be able to accumulate tool calls in any form, i.e. if you need to call a tool to get information which you need to pass to another tool.

mrasoahaingo commented 1 month ago

I have the same issue. The workaround is the same as mentionned above, save the streamed text in a variable, then display it in the tool...