langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.18k stars 2.05k forks source link

Streaming Tool Calls not supported in ChatVertexAI, ChatGoogleGenerativeAI and ChatGroq #6099

Open DevDeepakBhattarai opened 1 month ago

DevDeepakBhattarai commented 1 month ago

Checked other resources

Example Code

// This code is for selecting different models
switch (model) {
    case "gpt": {
      modelObject = new ChatOpenAI({
        model: modelName,
        apiKey: env.OPENAI_API_KEY,
        streaming: stream,
      })
      break;
    }

    case "claude": {
      modelObject = new ChatAnthropic({
        model: "claude-3-5-sonnet-20240620",
        apiKey: env.ANTHROPIC_API_KEY,
        streaming: true,
      });
      break;
    }

    case "gemini": {
      modelObject = new ChatVertexAI({
        model: "gemini-1.5-flash-001",
        authOptions: {
          credentials: {
            auth_provider_x509_cert_url: env.GOOGLE_AUTH_PROVIDER_X509_CERT_URL,
            auth_uri: env.GOOGLE_AUTH_URI,
            client_email: env.GOOGLE_CLIENT_EMAIL,
            client_id: env.GOOGLE_VERTEX_CLIENT_ID,
            client_x509_cert_url: env.GOOGLE_CLIENT_X509_CERT_URL,
            private_key: env.GOOGLE_PRIVATE_KEY,
            private_key_id: env.GOOGLE_PRIVATE_KEY_ID,
            project_id: env.GOOGLE_PROJECT_ID,
            token_uri: env.GOOGLE_TOKEN_URI,
            type: "service_account",
          },
        },
        // There is no option to pass streaming in ChatVertexAI
        temperature: 0,
      });
      break;
    }
    case "groq": {
      modelObject = new ChatGroq({
        apiKey: env.GROQ_API_KEY,
        streaming: true,
        model: "llama3-70b-8192",
        temperature: 0.7,
      });
    }
  }

When ever we pass any tools to model except ChatOpenAI, the streaming functionality does not work.

const invokeModel = async (
  state: AgentExecutorState,
  config?: RunnableConfig,
): Promise<Partial<AgentExecutorState>> => {

  const initialPrompt =
    state.model !== "groq" ? promptWithImage : promptWithoutImages;

  const tools = [search_tool, weatherTool, crypto_tool];
  const llm = modelPicker(state.model, true).bindTools(tools, {
    tool_choice: "auto",
  });
  const chain = initialPrompt.pipe(llm);

  let result: AIMessageChunk | undefined = undefined;
  result = await chain
    .withConfig({ runName: OUTPUT_MODEL })
    .invoke(state, config);

  if (!result) {
    return {
      result: "Sorry there was an error",
    };
  }

  if (
    state.model === "gemini" &&
    result.additional_kwargs.tool_calls &&
    result.additional_kwargs.tool_calls.length > 0
  ) {
    const tool_call = result.additional_kwargs.tool_calls[0]!;
    const toolCall = {
      name: tool_call.function.name,
      parameters: safeJsonParse(tool_call.function.arguments)!,
      id: tool_call.id ?? "",
    };
    return {
      toolCall,
      chat_history: [result],
    };
  }

  if (result.tool_calls && result.tool_calls.length > 0) {
    const toolCall = {
      name: result.tool_calls[0]!.name,
      parameters: result.tool_calls[0]!.args,
      id: result.tool_calls[0]!.id ?? "",
    };
    return {
      toolCall,
      chat_history: [result],
    };
  }

  return {
    result: result.content as string,
    chat_history: [result],
    toolCall: undefined,
  };
};

The model are being invoked here, but the langgraph agent is being streamed. OpenAI works as I expect , giving me on_llm_stream event. But when ever other model are used only on_llm_start and on_llm_end event are fired.

Also while I am at this , ChatVertexAI does not have any option to pass streaming / stream when I initiate it.

new ChatVertexAI({
        model: "gemini-1.5-flash-001",
        authOptions: {
          credentials: {
            auth_provider_x509_cert_url: env.GOOGLE_AUTH_PROVIDER_X509_CERT_URL,
            auth_uri: env.GOOGLE_AUTH_URI,
            client_email: env.GOOGLE_CLIENT_EMAIL,
            client_id: env.GOOGLE_VERTEX_CLIENT_ID,
            client_x509_cert_url: env.GOOGLE_CLIENT_X509_CERT_URL,
            private_key: env.GOOGLE_PRIVATE_KEY,
            private_key_id: env.GOOGLE_PRIVATE_KEY_ID,
            project_id: env.GOOGLE_PROJECT_ID,
            token_uri: env.GOOGLE_TOKEN_URI,
            type: "service_account",
          },
        },
        temperature: 0,
        // no option to pass stream / streaming
      });

Error Message and Stack Trace (if applicable)

No response

Description

I am trying to build a LLM app with multiple model support. ChatOpenAI works well for good user experience with streaming but other provider (mainly ChatVertexAI, ChatGoogleGenerativeAI and ChatGroq) do not stream data.

System Info

pnpm version 9.4 windows 10

dosubot[bot] commented 1 month ago

Streaming support for models like ChatVertexAI, ChatGoogleGenerativeAI, and ChatGroq is indeed a work in progress in LangChain JS. Here are some details and workarounds you can consider:

ChatVertexAI

For ChatVertexAI, streaming support is being implemented. You can use the _streamResponseChunks method to handle streaming responses. Here is an example:

import { ChatVertexAI } from "@langchain/google-vertexai";

const model = new ChatVertexAI({
  temperature: 0.7,
  model: "gemini-1.5-flash-001",
});

const stream = await model.stream([
  ["system", "You are a funny assistant that answers in pirate language."],
  ["human", "What is your favorite food?"],
]);

for await (const chunk of stream) {
  console.log(chunk);
}

ChatGoogleGenerativeAI

For ChatGoogleGenerativeAI, you can use a similar approach as ChatVertexAI. The streaming support is being added, and you can use the .stream() method to handle streaming responses.

ChatGroq

For ChatGroq, you can enable streaming by setting the streaming parameter to true in the model configuration. Here is an example:

import { ChatGroq } from "@langchain/groq";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";

const model = new ChatGroq({
  apiKey: process.env.GROQ_API_KEY,
  streaming: true,
  model: "llama3-70b-8192",
  temperature: 0.7,
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "You are a helpful assistant"],
  ["human", "{input}"],
]);

const outputParser = new StringOutputParser();
const chain = prompt.pipe(model).pipe(outputParser);

const response = await chain.stream({
  input: "Hello",
});

let res = "";
for await (const item of response) {
  res += item;
  console.log("stream:", res);
}

Workarounds for Non-Native Streaming Support

For models that do not natively support streaming, you can use the .stream() method, which will yield a single chunk. This allows you to use the same general pattern when calling them.

import { ChatAnthropic } from "@langchain/anthropic";
import { concat } from "@langchain/core/utils/stream";
import type { AIMessageChunk } from "@langchain/core/messages";

const model = new ChatAnthropic({ model: "claude-3-sonnet-20240229" });

const stream = await model.stream("what color is the sky?");

let gathered: AIMessageChunk | undefined = undefined;

for await (const chunk of stream) {
  console.log(chunk);
  if (gathered === undefined) {
    gathered = chunk;
  } else {
    gathered = concat(gathered, chunk);
  }
}

console.log(gathered);

Event Handling

The on_llm_start and on_llm_end events are fired for all models, but the on_llm_stream event is only fired for models that support token-by-token streaming. For models that do not support streaming, you will only see on_llm_start and on_llm_end events.

Conclusion

While streaming support is being added for various models, you can use the .stream() method and handle the responses in a consistent manner. For models that do not support streaming natively, the entire response will be returned as a single chunk.

For more details, you can refer to the LangChain JS documentation and examples:

To continue talking to Dosu, mention @dosu.