Redis Cache does not work with JSON mode on `gpt-4-1106-preview`

Prince-Mendiratta commented 7 months ago

Hi! I've noticed that the cache does not work with JSON mode.

Cache works well here:

import Redis from 'ioredis';
import { RedisCache } from 'langchain/cache/ioredis';
import { OpenAI } from 'langchain/llms/openai';
import { API_KEY } from './constants';
import { PromptTemplate } from 'langchain/prompts';
import { LLMChain } from 'langchain/chains';
import { ChatOpenAI } from 'langchain/chat_models/openai';

async function runChain() {
  const redis = new Redis();
  const cache = new RedisCache(redis);
  const llm = new ChatOpenAI({
    openAIApiKey: API_KEY,
    temperature: 0.69,
    topP: 1,
    cache,
    maxTokens: -1,
    modelName: 'gpt-4-1106-preview',
  });
  const prompt_template = new PromptTemplate({
    template: 'What is 1 + {app}? Respond in json format.',
    inputVariables: ['app'],
  });
  const chain = new LLMChain({
    llm,
    prompt: prompt_template,
    outputKey: 'res',
    verbose: true,
  });
  chain.call({ app: '1' });
}

(() => runChain())();

This does not:

...
const llm = new ChatOpenAI({
    openAIApiKey: API_KEY,
    temperature: 0.69,
    topP: 1,
    cache,
    maxTokens: -1,
    modelName: 'gpt-4-1106-preview',
  }).bind({
    response_format: {
      type: 'json_object',
    },
  });
...

dosubot[bot] commented 7 months ago

🤖

Hello again, @Prince-Mendiratta! Great to see you diving deep into langchainjs. Hope you're enjoying the experience as much as we enjoy seeing you here. Let's look into this issue.

Based on the information available, it seems that the ChatOpenAI class in the langchainjs codebase does support caching. However, the provided context does not specify if it supports caching when used with the 'gpt-4-1106-preview' model specifically. The tests provided in the context show the usage of caching with the 'gpt-3.5-turbo' model. Here is the supporting evidence from the context:

test("ChatOpenAI can cache generations", async () => {
  const memoryCache = new InMemoryCache();
  const lookupSpy = jest.spyOn(memoryCache, "lookup");
  const updateSpy = jest.spyOn(memoryCache, "update");
  const chat = new ChatOpenAI({
    modelName: "gpt-3.5-turbo",
    maxTokens: 10,
    n: 2,
    cache: memoryCache,
  });
  const message = new HumanMessage("Hello");
  const res = await chat.generate([[message], [message]]);
  expect(res.generations.length).toBe(2);

  expect(lookupSpy).toHaveBeenCalledTimes(2);
  expect(updateSpy).toHaveBeenCalledTimes(2);

  lookupSpy.mockRestore();
  updateSpy.mockRestore();
});

In this test, an instance of ChatOpenAI is created with a memoryCache passed as the cache option. The generate method is then called twice with the same input, and the test checks that the cache's lookup and update methods are called the expected number of times. This indicates that the ChatOpenAI class does support caching. However, without specific information about the 'gpt-4-1106-preview' model, it's not possible to confirm whether caching is supported for that model. You can find this test in the langchain/src/chat_models/tests/chatopenai.int.test.ts file.

Regarding your question about the response_format set to json_object, I wasn't able to find specific information about this within the repository. It's possible that the caching mechanism might behave differently when the response_format is set to json_object, but without further information or context, it's hard to say for sure.

If you could provide more details about the issue you're experiencing, such as any error messages or unexpected behavior, it would be helpful in diagnosing the problem. Also, if you could share more about your use case or the specific requirements for using the 'gpt-4-1106-preview' model with the response_format set to json_object, it might help in finding a solution or workaround.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

jacoblee93 commented 6 months ago

What do you mean "doesn't work"? It throws an error or it doesn't cache the response?

Prince-Mendiratta commented 6 months ago

Ah, sorry for not being descriptive enough. By saying it doesn’t work, I meant that the response isn’t cached. With the same prompt, a new request is sent to the LLM always and no caching takes place. No error is thrown.

On Thu, 21 Dec 2023 at 10:55 PM, Jacob Lee @.***> wrote:

What do you mean "doesn't work"? It throws an error or it doesn't cache the response?

— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchainjs/issues/3686#issuecomment-1866680558, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM4SPLGUCR6PZHSQ2INULFDYKRWIBAVCNFSM6AAAAABAXQXHCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWGY4DANJVHA . You are receiving this because you were mentioned.Message ID: @.***>

jacoblee93 commented 6 months ago

Got it - I think I know what's causing it. Will try to dig in today.

jacoblee93 commented 6 months ago

CC @Prince-Mendiratta I looked into this in #3754 and wasn't able to repro with the in memory cache - can you try the following?

  import { HumanMessage } from "@langchain/core/messages";
  import { InMemoryCache } from "@langchain/core/caches";
  import { ChatOpenAI } from "@langchain/openai";

  const memoryCache = new InMemoryCache();
  const chat = new ChatOpenAI({
    modelName: "gpt-3.5-turbo-1106",
    temperature: 1,
    cache: memoryCache,
  }).bind({
    response_format: {
      type: "json_object",
    },
  });
  const message = new HumanMessage(
    "Respond with a JSON object containing arbitrary fields."
  );
  const res = await chat.invoke([message]);
  console.log(res);

  const res2 = await chat.invoke([message]);
  console.log(res2);

Prince-Mendiratta commented 6 months ago

@jacoblee93 Thanks for looking into this! The example you shared does work, I think the issues lies when using LLMChain. Please try this example where I was able to reproduce the issue, I'm using the date log to determine if the response was cached or not:

import { InMemoryCache } from 'langchain/cache';
import { ChatOpenAI } from 'langchain/chat_models/openai';
import { LLMChain } from 'langchain/chains';
import { PromptTemplate } from 'langchain/prompts';

(async () => {
  const memoryCache = new InMemoryCache();
  const chat = new ChatOpenAI({
    modelName: 'gpt-3.5-turbo-1106',
    temperature: 1,
    cache: memoryCache,
  }).bind({
    response_format: {
      type: 'json_object',
    },
  });
  const prompt_template = new PromptTemplate({
    template: 'Respond with a JSON object containing arbitrary {app}.',
    inputVariables: ['app'],
  });
  const chain = new LLMChain({
    llm: chat,
    prompt: prompt_template,
    outputKey: 'res',
    verbose: true,
  });
  console.log(new Date());
  await chain.call({ app: 'fields' });
  console.log(new Date());
  await chain.call({ app: 'fields' });
  console.log(new Date());
})();

jacoblee93 commented 6 months ago

Ah. We're moving towards deprecating LLMChain in favor of a prompt -> LLM -> output parser runnable:

https://js.langchain.com/docs/expression_language/cookbook/prompt_llm_parser

Would you be up for switching over to that?

ImTheDeveloper commented 5 months ago

I'm wondering if I'm hitting the same issue here or it is different?

                const promptTemplate = PromptTemplate.fromTemplate(botPrompt);
                const outputParser = new JsonOutputFunctionsParser();
                const model = new ChatOpenAI({ openAIApiKey: service_token, modelName: 'gpt-4-turbo-preview', verbose: true, cache: llm_cache, timeout: 6000 });
                const chain = RunnableSequence.from([promptTemplate, model.bind({ functions: [functionSchema], function_call: { name: 'extractor' } }), outputParser])
                const outcome = await chain.invoke({ message }, {metadata: metaData }) as AIResponse;

In the above, I'm using a JsonOutputFunctionsParser along with the following functionSchema:

const functionSchema = {
    name: 'extractor',
    description: 'Extracts the relevant moderation decision based on the input',
    parameters: {
        type: "object",
        properties: {
            decision: {
                type: 'string',
                enum: ['D', 'K', 'R'],
                description: 'The overall decision made'
            },
        },
    },
    required: ['decision']
}

In Redis I can see the key being created with the following content:

However - even though cached there are still tokens being used to execute I assume the extractor function. I would have assumed this whole process would result in a cache hit and simply return the stored content. For additional context heres the landsmith traces for the exact same runs. There is definitely some caching, but I'd have expected a near instant response and 0 token usage.

dosubot[bot] commented 2 months ago

Hi, @Prince-Mendiratta,

I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, you reported an issue with the Redis cache not working with JSON mode on the gpt-4-1106-preview model. There was a discussion with Jacoblee93 about potential causes, code examples were shared, and there was an agreement to deprecate LLMChain in favor of a prompt -> LLM -> output parser runnable. ImTheDeveloper also shared a similar issue with using a JsonOutputFunctionsParser and provided additional context and screenshots.

Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

jacoblee93 commented 1 month ago

This is still an issue!

langchain-ai / langchainjs

Redis Cache does not work with JSON mode on `gpt-4-1106-preview` #3686

Sources