Streaming Tool Calls not supported in ChatVertexAI, ChatGoogleGenerativeAI and ChatGroq

langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗

MIT License

12.18k stars 2.05k forks source link

// This code is for selecting different models switch (model) { case "gpt": { modelObject = new ChatOpenAI({ model: modelName, apiKey: env.OPENAI_API_KEY, streaming: stream, }) break; } case "claude": { modelObject = new ChatAnthropic({ model: "claude-3-5-sonnet-20240620", apiKey: env.ANTHROPIC_API_KEY, streaming: true, }); break; } case "gemini": { modelObject = new ChatVertexAI({ model: "gemini-1.5-flash-001", authOptions: { credentials: { auth_provider_x509_cert_url: env.GOOGLE_AUTH_PROVIDER_X509_CERT_URL, auth_uri: env.GOOGLE_AUTH_URI, client_email: env.GOOGLE_CLIENT_EMAIL, client_id: env.GOOGLE_VERTEX_CLIENT_ID, client_x509_cert_url: env.GOOGLE_CLIENT_X509_CERT_URL, private_key: env.GOOGLE_PRIVATE_KEY, private_key_id: env.GOOGLE_PRIVATE_KEY_ID, project_id: env.GOOGLE_PROJECT_ID, token_uri: env.GOOGLE_TOKEN_URI, type: "service_account", }, }, // There is no option to pass streaming in ChatVertexAI temperature: 0, }); break; } case "groq": { modelObject = new ChatGroq({ apiKey: env.GROQ_API_KEY, streaming: true, model: "llama3-70b-8192", temperature: 0.7, }); } }

new ChatVertexAI({ model: "gemini-1.5-flash-001", authOptions: { credentials: { auth_provider_x509_cert_url: env.GOOGLE_AUTH_PROVIDER_X509_CERT_URL, auth_uri: env.GOOGLE_AUTH_URI, client_email: env.GOOGLE_CLIENT_EMAIL, client_id: env.GOOGLE_VERTEX_CLIENT_ID, client_x509_cert_url: env.GOOGLE_CLIENT_X509_CERT_URL, private_key: env.GOOGLE_PRIVATE_KEY, private_key_id: env.GOOGLE_PRIVATE_KEY_ID, project_id: env.GOOGLE_PROJECT_ID, token_uri: env.GOOGLE_TOKEN_URI, type: "service_account", }, }, temperature: 0, // no option to pass stream / streaming });

Streaming support for models like ChatVertexAI, ChatGoogleGenerativeAI, and ChatGroq is indeed a work in progress in LangChain JS. Here are some details and workarounds you can consider:

ChatVertexAI

For ChatVertexAI, streaming support is being implemented. You can use the _streamResponseChunks method to handle streaming responses. Here is an example:

import { ChatVertexAI } from "@langchain/google-vertexai";

const model = new ChatVertexAI({
  temperature: 0.7,
  model: "gemini-1.5-flash-001",
});

const stream = await model.stream([
  ["system", "You are a funny assistant that answers in pirate language."],
  ["human", "What is your favorite food?"],
]);

for await (const chunk of stream) {
  console.log(chunk);
}

ChatGoogleGenerativeAI

For ChatGoogleGenerativeAI, you can use a similar approach as ChatVertexAI. The streaming support is being added, and you can use the .stream() method to handle streaming responses.

ChatGroq

For ChatGroq, you can enable streaming by setting the streaming parameter to true in the model configuration. Here is an example:

import { ChatGroq } from "@langchain/groq";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";

const model = new ChatGroq({
  apiKey: process.env.GROQ_API_KEY,
  streaming: true,
  model: "llama3-70b-8192",
  temperature: 0.7,
});

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "You are a helpful assistant"],
  ["human", "{input}"],
]);

const outputParser = new StringOutputParser();
const chain = prompt.pipe(model).pipe(outputParser);

const response = await chain.stream({
  input: "Hello",
});

let res = "";
for await (const item of response) {
  res += item;
  console.log("stream:", res);
}

Workarounds for Non-Native Streaming Support

For models that do not natively support streaming, you can use the .stream() method, which will yield a single chunk. This allows you to use the same general pattern when calling them.

import { ChatAnthropic } from "@langchain/anthropic";
import { concat } from "@langchain/core/utils/stream";
import type { AIMessageChunk } from "@langchain/core/messages";

const model = new ChatAnthropic({ model: "claude-3-sonnet-20240229" });

const stream = await model.stream("what color is the sky?");

let gathered: AIMessageChunk | undefined = undefined;

for await (const chunk of stream) {
  console.log(chunk);
  if (gathered === undefined) {
    gathered = chunk;
  } else {
    gathered = concat(gathered, chunk);
  }
}

console.log(gathered);

Event Handling

The on_llm_start and on_llm_end events are fired for all models, but the on_llm_stream event is only fired for models that support token-by-token streaming. For models that do not support streaming, you will only see on_llm_start and on_llm_end events.

Conclusion

While streaming support is being added for various models, you can use the .stream() method and handle the responses in a consistent manner. For models that do not support streaming natively, the entire response will be returned as a single chunk.

For more details, you can refer to the LangChain JS documentation and examples:

To continue talking to Dosu, mention @dosu.

langchain-ai / langchainjs