ChatVertexAI example always returns a truncated response

svengau commented 3 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain.js documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain.js rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { ChatVertexAI } from "@langchain/google-vertexai";
// Or, if using the web entrypoint:
// import { ChatVertexAI } from "@langchain/google-vertexai-web";

const model = new ChatVertexAI({
  temperature: 0.7,
  modelName: "gemini-1.0-pro",
});

const response = await model.invoke("Why is the ocean blue?");
console.log(response);
/*
AIMessageChunk {
  content: [{ type: 'text', text: 'The ocean' }], // <===== ONLY FIRST CHUNK
  additional_kwargs: {},
  response_metadata: {}
}
 */

Error Message and Stack Trace (if applicable)

N/A

Description

I've tried the provided example in the doc, and the invoke method always returns the first chunk of what could be a stream.

As a workaround, I use:

const stream = await (llm as any).stream("...");
  let response = "";
  for await (const chunk of stream) {
    console.log(chunk.lc_kwargs.content);
    response += chunk?.lc_kwargs?.content?.toString();
  }

System Info

langchain: 0.1.30 @langchain/community: 0.0.44 @langchain/core: 0.1.53 @langchain/google-vertexai: 0.0.2 @langchain/openai: 0.0.25 MacOS Sonoma 14.2.1 / Apple M1 Pro node: 20.10.0 yarn: 1.22.1

dosubot[bot] commented 3 months ago

Hey @svengau,

Thanks for bringing this to our attention and for the detailed report! It seems like you've encountered an issue where the ChatVertexAI class from @langchain/google-vertexai is only returning the first chunk of the expected response. This behavior is indeed not intended for most use cases, as you'd typically want the full response to your query.

Your workaround using the streaming approach is a smart move and effectively bypasses the issue by manually concatenating the response chunks. This method ensures that you receive the complete response as intended. However, I understand that having to resort to such workarounds is not ideal.

I've taken a closer look at the issue, and it appears to be related to how the invoke method handles response chunks. The method should ideally aggregate all chunks into a single response before returning, but it seems like it's currently stopping after the first chunk. This is something that needs to be addressed in the library's code.

While I cannot directly create a pull request or issue due to my limitations, I highly recommend reporting this behavior as a bug in the @langchain/google-vertexai repository. When reporting, please include the details you've shared here, along with your temporary workaround. This will help the maintainers understand the issue fully and expedite a fix.

In the meantime, continue using your streaming workaround for a complete response. Keep an eye on the repository for updates and potential fixes to this issue. The LangChain.js community is quite responsive, so I'm hopeful a solution will be available soon.

If you have any more questions or run into other issues, feel free to reach out. I'm here to help!

Best, Dosu

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

afirstenberg commented 3 months ago

Hmmm... that almost seems consistent, since Vertex only has streaming, but it should have combined things. And it probably shoudln't be returning a chunk.

@jacoblee93 - Please assign this to me.

langchain-ai / langchainjs