ChatModel forced to stream if called in streaming context regardless of config or support

airhorns commented 2 weeks ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain.js documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain.js rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

export const model = new ChatGroq({
  model: "mixtral-8x7b-32768",
  streaming: false
});

// works fine outside of a LangGraph or LangChain streaming context, but fails if run inside an outer `.streamEvents` call
await model.bind({ response_format: { type: "json_object" } }).invoke([new HumanMessage("generate some example JSON")]);

Error Message and Stack Trace (if applicable)

400 {"error":{"message":"response_format` does not support streaming","type":"invalid_request_error"}} Error: 400 {"error":{"message":"response_format` does not support streaming","type":"invalid_request_error"}} at Function.generate (/Users/airhorns/Code/gadget/node_modules/.pnpm/groq-sdk@0.5.0/node_modules/groq-sdk/src/error.ts:58:14) at Groq.makeStatusError (/Users/airhorns/Code/gadget/node_modules/.pnpm/groq-sdk@0.5.0/node_modules/groq-sdk/src/core.ts:397:21) at Groq.makeRequest (/Users/airhorns/Code/gadget/node_modules/.pnpm/groq-sdk@0.5.0/node_modules/groq-sdk/src/core.ts:460:24) at processTicksAndRejections (node:internal/process/task_queues:95:5) at async RetryOperation._fn (/Users/airhorns/Code/gadget/node_modules/.pnpm/p-retry@4.5.0/node_modules/p-retry/index.js:50:12)

Description

Trying to use Groq's LangChain wrapper to make inference calls to their API
Their API does not support streaming if you are using response_format: "json", documented here: https://console.groq.com/docs/text-chat
Their SDK and by proxy LangChain supports opting out of streaming mode, but, that configuration is ignored if the model is invoked from a LangChain streaming context because of this code: https://github.com/langchain-ai/langchainjs/blob/f9f1b7f42f1ec0db11a153f3c957f22aaf52a095/langchain-core/src/language_models/chat_models.ts#L365-L376

I think its not safe to just assume that the model can stream if the ambient context wants it to stream -- I think if specific params are passed to the invocation that tell it not to stream, it shouldn't. Without this ability to override, I can't use Groq's json support at all from within LangGraph or a bigger LangChain that is streaming tokens from other models!

System Info

airhorns@hugs ~/C/gadget (genny-groq)> pnpm info langchain

langchain@0.3.2 | MIT | deps: 12 | versions: 297
Typescript bindings for langchain
https://github.com/langchain-ai/langchainjs/tree/main/langchain/

keywords: llm, ai, gpt3, chain, prompt, prompt engineering, chatgpt, machine learning, ml, openai, embeddings, vectorstores

dist
.tarball: https://registry.npmjs.org/langchain/-/langchain-0.3.2.tgz
.shasum: aec3e679d3d6c36f469448380affa475c92fbd86
.integrity: sha512-kd2kz1cS/PIVrLEDFlrZsAasQfPLbY1UqCZbRKa3/QcpB33/n6xPDvXSMfBuKhvNj0bjW6MXDR9HZTduXjJBgg==
.unpackedSize: 2.9 MB

dependencies:
@langchain/openai: >=0.1.0 <0.4.0        jsonpointer: ^5.0.1                      uuid: ^10.0.0
@langchain/textsplitters: >=0.0.0 <0.2.0 langsmith: ^0.1.56-rc.1                  yaml: ^2.2.1
js-tiktoken: ^1.0.12                     openapi-types: ^12.1.3                   zod-to-json-schema: ^3.22.3
js-yaml: ^4.1.0                          p-retry: 4                               zod: ^3.22.4

maintainers:
- nfcampos <nuno@boringbits.io>
- jacoblee93 <jacoblee93@gmail.com>
- andrewnguonly <andrewnguonly@gmail.com>
- davidduong <david@duong.cz>
- hwchase17 <hw.chase.17@gmail.com>
- basproul <braceasproul@gmail.com>

dist-tags:
latest: 0.3.2                              next: 0.3.2-rc.0                           tag-for-publishing-older-releases: 0.2.20

dosubot[bot] commented 2 weeks ago

To address the issue where the ChatGroq model defaults to streaming in a LangChain streaming context, you need to ensure that the streaming property is explicitly set to false in the model configuration. This should override the default behavior even in a streaming context.

Here's how you can configure it:

import { ChatGroq } from '@langchain/groq';

const model = new ChatGroq({
  model: "mixtral-8x7b-32768",
  streaming: false, // Explicitly disable streaming
});

// Bind the response format to JSON
await model.bind({ response_format: { type: "json_object" } }).invoke([new HumanMessage("generate some example JSON")]);

The issue arises because the LangChain context might force streaming due to its internal logic, which checks for streaming conditions like signals or timeouts ^[1]. By setting streaming: false, you should be able to prevent this behavior and use the JSON response format without encountering errors.

To continue talking to Dosu, mention @dosu.

andrewdoro commented 1 week ago

having the same issue

kodychik commented 1 week ago

If no one else is working on this right now, I'd like to take a look at this issue. Thanks

langchain-ai / langchainjs