vercel / ai

Build AI-powered applications with React, Svelte, Vue, and Solid
https://sdk.vercel.ai/docs
Other
9.23k stars 1.33k forks source link

Help: how to convert LangChain streamedResponse to StreamingTextResponse #335

Closed anhhtca closed 8 months ago

anhhtca commented 1 year ago

Hi,

I have a backend API with the code for abc.com/api/chat as follows:

const nonStreamingModel = new ChatOpenAI({
        modelName: "gpt-3.5-turbo",
      }, configuarion)

      const streamingModel = new ChatOpenAI({
        streaming: true,
        callbacks: [
          {
            handleLLMNewToken(token) {
              streamedResponse += token;
            },
          },
        ],
      }, configuarion)

      const embedding = new OpenAIEmbeddings({}, configuarion)

      try {
        const vectorStore = await HNSWLib.load(`${baseDirectoryPath}/docs/index/data/`, embedding);

        const chain = ConversationalRetrievalQAChain.fromLLM(
          streamingModel,
          vectorStore.asRetriever(), {
          memory: new BufferMemory({
            memoryKey: "chat_history",
            returnMessages: true
          }),
          questionGeneratorChainOptions: {
            llm: nonStreamingModel,
            template: await CONDENSE_PROMPT.format({ question: userContent, chat_history }),
          },
          qaChainOptions: {
            type: "stuff",
            prompt: QA_PROMPT
          }
        }
        );

        await chain.call({ question: userContent });
        return streamedResponse

The front-end I am using is vercel/ai-chatbot hosted on xyz.com/chat. The chat page retrieves a StreamingTextResponse and binds it to the UI.

I am really unsure about how to convert the LangChain stream text for the front-end.

Thank you for your help!

wb-ts commented 1 year ago

To convert the StreamingTextResponse to the format expected by the front-end, you can follow these steps:

  1. Extract the chat messages from the StreamingTextResponse:

    const chatMessages = streamedResponse.messages.map((message) => message.content);
  2. Format the chat messages in a way that suits your front-end UI requirements. For example, you can concatenate the messages into a single string with appropriate line breaks:

    const formattedChatMessages = chatMessages.join('\n');
  3. Pass the formattedChatMessages to your front-end UI for rendering.

Here's an updated version of your code with the changes mentioned above:

const nonStreamingModel = new ChatOpenAI({
  modelName: "gpt-3.5-turbo",
}, configuration);

const streamingModel = new ChatOpenAI({
  streaming: true,
  callbacks: [
    {
      handleLLMNewToken(token) {
        streamedResponse += token;
      },
    },
  ],
}, configuration);

const embedding = new OpenAIEmbeddings({}, configuration);

try {
  const vectorStore = await HNSWLib.load(`${baseDirectoryPath}/docs/index/data/`, embedding);

  const chain = ConversationalRetrievalQAChain.fromLLM(
    streamingModel,
    vectorStore.asRetriever(),
    {
      memory: new BufferMemory({
        memoryKey: "chat_history",
        returnMessages: true,
      }),
      questionGeneratorChainOptions: {
        llm: nonStreamingModel,
        template: await CONDENSE_PROMPT.format({ question: userContent, chat_history }),
      },
      qaChainOptions: {
        type: "stuff",
        prompt: QA_PROMPT,
      },
    }
  );

  await chain.call({ question: userContent });

  const chatMessages = streamedResponse.messages.map((message) => message.content);
  const formattedChatMessages = chatMessages.join('\n');

  return formattedChatMessages;
} catch (error) {
  // Handle errors appropriately
}

Make sure to integrate the updated code with your backend API (abc.com/api/chat) and pass the formattedChatMessages to the front-end for rendering.

Let me know if you have any further questions!

anhhtca commented 1 year ago

To convert the StreamingTextResponse to the format expected by the front-end, you can follow these steps:

1. Extract the chat messages from the `StreamingTextResponse`:
const chatMessages = streamedResponse.messages.map((message) => message.content);

First of all, I sincerely thank you for your response. I am still very new in the AI field, I have made this chat app for 2 months but have not achieved the expected results.

Next, I apologize for the unclear question. In my case, streamedResponse is a string following the instructions on the LangChain page (https://js.langchain.com/docs/modules/chains/index_related_chains/conversational_retrieval).

export const run = async () => {
  const text = fs.readFileSync("state_of_the_union.txt", "utf8");
  const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
  const docs = await textSplitter.createDocuments([text]);
  const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings());
  let streamedResponse = "";
  const streamingModel = new ChatOpenAI({
    streaming: true,
    callbacks: [
      {
        handleLLMNewToken(token) {
          streamedResponse += token;
        },
      },
    ],
  });
  const nonStreamingModel = new ChatOpenAI({});
  const chain = ConversationalRetrievalQAChain.fromLLM(
    streamingModel,
    vectorStore.asRetriever(),
    {
      returnSourceDocuments: true,
      memory: new BufferMemory({
        memoryKey: "chat_history",
        inputKey: "question", // The key for the input to the chain
        outputKey: "text", // The key for the final conversational output of the chain
        returnMessages: true, // If using with a chat model
      }),
      questionGeneratorChainOptions: {
        llm: nonStreamingModel,
      },
    }
  );
  /* Ask it a question */
  const question = "What did the president say about Justice Breyer?";
  const res = await chain.call({ question });
  console.log({ streamedResponse });
  /*
    {
      streamedResponse: 'President Biden thanked Justice Breyer for his service, and honored him as an Army veteran, Constitutional scholar and retiring Justice of the United States Supreme Court.'
    }
  */
};

On the backend, I am not using Vercel AI SDK, only using LangChainJS.

At front-end I am using vercel-lab/ai-chatbot and Vercel AI SDK.

anhhtca commented 1 year ago

I have dug, searched and found a way to transfer the stream from NodeJS to vercel-labs/ai stream, however on the UI the bot's response content does not stream but appears all at once. I am still continuing to look for a solution.

await chain.call({ question });

    const responseStream = new Readable()
    responseStream.push(streamedResponse);
    responseStream.push(null);

    const readableStream = new ReadableStream({
        start(controller) {
            responseStream.on('data', (chunk) => {
                controller.enqueue(chunk);
            })

            responseStream.on('end', () => {
                controller.close();
            })

            responseStream.on('error', (err) => {
                controller.error(err);
            })
        }
    });
    streamToResponse(readableStream, res);
wb-ts commented 1 year ago

To properly stream the response content from Node.js to the vercel-labs/ai stream, you need to modify your code to use the appropriate streaming techniques. Here's an example of how you can achieve this:

await chain.call({ question });

// Create a readable stream
const responseStream = new Readable({
  read() {}
});

// Push chunks of data into the stream
responseStream.push(streamedResponse);
responseStream.push(null);

// Set appropriate headers to indicate streaming response
res.setHeader('Content-Type', 'application/octet-stream'); // Adjust the content type as necessary
res.setHeader('Transfer-Encoding', 'chunked');

// Stream the response to the client
responseStream.pipe(res);

In this code snippet, the key difference is that we are using the pipe() method to stream the response content directly to the res object. This will ensure that the data is sent to the client in chunks and will be displayed in real-time on the UI.

Make sure to set the appropriate content type and headers based on the format of the streamed response. Adjust the 'application/octet-stream' content type to match the actual data being streamed, such as 'text/plain' for plain text or 'application/json' for JSON data.

By using the pipe() method, you can avoid manually handling the ReadableStream and its associated events, reducing complexity and improving stream handling.

anhhtca commented 1 year ago

I tried your code, but the result is the same as mine.

Do you think there might be a problem with the front-end?

import { OpenAIStream, StreamingTextResponse } from 'ai'
import { Configuration, OpenAIApi } from 'openai-edge'

import { nanoid } from '@/lib/utils'

export const runtime = 'edge'

const configuration = new Configuration({
  apiKey: process.env.OPENAI_API_KEY
})

const openai = new OpenAIApi(configuration)

export async function POST(req: Request) {
  const json = await req.json()
  const { messages, previewToken } = json

  const res = await openai.createChatCompletion({
    model: 'gpt-3.5-turbo',
    messages,
    temperature: 0.7,
    stream: true
  })

  const stream = OpenAIStream(res)

  return new StreamingTextResponse(stream)
}
'use client'

import { useChat, type Message } from 'ai/react'

import { cn } from '@/lib/utils'
import { ChatList } from '@/components/chat-list'
import { ChatPanel } from '@/components/chat-panel'
import { EmptyScreen } from '@/components/empty-screen'
import { ChatScrollAnchor } from '@/components/chat-scroll-anchor'
import { useLocalStorage } from '@/lib/hooks/use-local-storage'
import { useState } from 'react'
import { Button } from './ui/button'
import { Input } from './ui/input'
import { toast } from 'react-hot-toast'

export interface ChatProps extends React.ComponentProps<'div'> {
  initialMessages?: Message[]
  id?: string
}

export function Chat({ id, initialMessages, className }: ChatProps) {

  const { messages, append, reload, stop, isLoading, input, setInput } =
    useChat({
api: 'http://external_api/api/chat' // use external backend api, instead of NextJS API
      initialMessages,
      id,
      body: {
        id,
        previewToken
      },
      onResponse(response) {
        if (response.status === 401) {
          toast.error(response.statusText)
        }
      }
    })
  return (
    <>
      <div className={cn('pb-[200px] pt-4 md:pt-10', className)}>
        {messages.length ? (
          <>
            <ChatList messages={messages} />
            <ChatScrollAnchor trackVisibility={isLoading} />
          </>
        ) : (
          <EmptyScreen setInput={setInput} />
        )}
      </div>
      <ChatPanel
        id={id}
        isLoading={isLoading}
        stop={stop}
        append={append}
        reload={reload}
        messages={messages}
        input={input}
        setInput={setInput}
      />
    </>
  )
}

I am not sure if the front-end can "understand" our response from the backend.

wb-ts commented 1 year ago

You can check whether frontend can "understand" your response from the backend through checking with development tool(Inspect) on the browser , or you can test with simple project that shows the response from your backend.