vercel / ai

Build AI-powered applications with React, Svelte, Vue, and Solid
https://sdk.vercel.ai/docs
Other
9.79k stars 1.45k forks source link

Usage with langchain's ConversationalRetrievalQAChain #246

Closed jaocfilho closed 1 year ago

jaocfilho commented 1 year ago

I've been trying to implement a chatbot that uses contexts from files.

I already have all the backend necessary to embed my files, but I'm struggling to make the last part work. Have anyone successfully implemented a chain with useChat and ConversationalRetrievalQAChain?

joaopcm commented 1 year ago

I'm facing the same challenge

ianmcfall commented 1 year ago

@jaocfilho While it has it's share of bugs I'm still trying to overcome - this is the base vercel-labs/ai-chatbot with useChat working with a Pinecone VerctorDB using ConversationalRetrievalQAChain

New to this, so any feedback is greatly appreciated. Cheers.

app/api/chat/route.ts

import { StreamingTextResponse, LangChainStream, Message } from 'ai'
import { ChatOpenAI } from 'langchain/chat_models/openai'
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { ConversationalRetrievalQAChain } from "langchain/chains";
import { BufferMemory } from "langchain/memory";
import { PineconeStore } from "langchain/vectorstores/pinecone";
import { PineconeClient } from "@pinecone-database/pinecone";
import { CallbackManager } from "langchain/callbacks";

export const runtime = 'edge'

export async function POST(req: Request) {
  // init req and langchain stream
  const json = await req.json()
  const { messages } = json
  const { stream, handlers } = LangChainStream()

  // init chat model
  const chat = new ChatOpenAI({
    modelName: "gpt-3.5-turbo-0613",
    temperature: 0.8,
    streaming: true,
    callbackManager: CallbackManager.fromHandlers(handlers),
  });

  // init pinecone vars and log err and exit if not set
  const apiKey = process.env.PINECONE_API_KEY;
  const environment = process.env.PINECONE_ENVIRONMENT;
  const namespace = process.env.PINECONE_NAMESPACE;
  const index = process.env.PINECONE_INDEX;
  if (!apiKey || !environment || !index || !namespace) {
    console.log("Environment variables PINECONE_API_KEY or PINECONE_ENVIRONMENT are not set.");
    process.exit(1);
  }

  // Initialize Pinecone Client and Index
  const pinecone = new PineconeClient();
  await pinecone.init({
    apiKey: apiKey,
    environment: environment
  });
  const pineconeIndex = pinecone.Index(index);

  // Initialize Pinecone Vector Store.
  const vectorStore = await PineconeStore.fromExistingIndex(
    new OpenAIEmbeddings(
      { batchSize: 100 }
    ),
    {pineconeIndex},
    );

  // Set the namespace inside Pinecone Vector Store
  vectorStore.namespace = namespace;

  // Create a chain that uses the OpenAI LLM and Pinecone vector store.
  const chain =  ConversationalRetrievalQAChain.fromLLM(
    chat, 
    vectorStore.asRetriever(),
    {
      memory: new BufferMemory({
        humanPrefix: "I want you to act as a document that I am having a conversation with. You will provide me with answers from the given info. If the answer is not included, search for an answer and return it. Never break character.",
        memoryKey: "chat_history",
        inputKey: "question",
        returnMessages: true
      }),
      returnSourceDocuments: false,
      verbose: false,
    },
    );

  // call chain with latest user input from messages array, catch errors
  chain
    .call({ question: json.messages[json.messages.length - 1].content })
    .catch(console.error)

  // return LangChainStream
  return new StreamingTextResponse(stream)
}
jaocfilho commented 1 year ago

This is what I get so far, the document retrieve and answer works fine, but I don't think the history implementation is correct.

import { NextRequest } from 'next/server';

import { StreamingTextResponse, LangChainStream, Message } from 'ai';

import { CallbackManager } from 'langchain/callbacks';
import { ChatOpenAI } from 'langchain/chat_models/openai';
import { AIChatMessage, HumanChatMessage } from 'langchain/schema';
import { ConversationalRetrievalQAChain } from 'langchain/chains';

import { getSupabaseVectorStore } from '@/modules/documents/vector_stores';

type ChatbotChatParams = {
  params: { id: string };
};

type ChatApiBodyParams = {
  messages: Message[];
};

export const runtime = 'edge';

export async function POST(
  request: NextRequest,
  { params }: ChatbotChatParams
) {
  const { id: chatbotId } = params;
  const { messages }: ChatApiBodyParams = await request.json();

  const { stream, handlers } = LangChainStream();

  const llm = new ChatOpenAI({
    streaming: true,
    callbacks: CallbackManager.fromHandlers(handlers),
  });

  const questionLlm = new ChatOpenAI({});
  const vectorStore = getSupabaseVectorStore();

  const chatHistory = ConversationalRetrievalQAChain.getChatHistoryString(
    messages.map((m) => {
      if (m.role == 'user') {
        return new HumanChatMessage(m.content);
      }

      return new AIChatMessage(m.content);
    })
  );

  const chain = ConversationalRetrievalQAChain.fromLLM(
    llm,
    vectorStore.asRetriever(1, {
      essential: { chatbotId },
    }),
    {
      questionGeneratorChainOptions: {
        llm: questionLlm,
      },
    }
  );

  const question = messages[messages.length - 1].content;

  chain
    .call({
      question,
      chat_history: chatHistory,
    })
    .catch(console.error)
    .finally(() => {
      handlers.handleChainEnd();
    });

  return new StreamingTextResponse(stream);
}
ErwinAI commented 1 year ago

Had a similar question earlier (https://github.com/vercel-labs/ai/issues/169). Wasn't very conclusive unfortunately. Still looking into how to use this properly.

aranlucas commented 1 year ago

Here's my working code that uses ConversationalRetrievalQAChain

From my understanding, you don't need BufferMemory as long as you're passing it in the call method https://js.langchain.com/docs/modules/chains/index_related_chains/conversational_retrieval#externally-managed-memory

You can verify it's working by checking the "verbose: true" when defining the chain, then look for the following prompt:

https://github.com/hwchase17/langchainjs/blob/5510b44/langchain/src/chains/conversational_retrieval_chain.ts#L17C39-L22

Note: Still not sure the best way to initialize pineConeClient but it might be worth initializing outside of run scope so that it can be re-used by other instance https://docs.aws.amazon.com/lambda/latest/operatorguide/static-initialization.html

import { PineconeClient } from "@pinecone-database/pinecone";
import { LangChainStream, StreamingTextResponse } from "ai";
import { ConversationalRetrievalQAChain } from "langchain/chains";
import { ChatOpenAI } from "langchain/chat_models/openai";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import {
  AIChatMessage,
  HumanChatMessage,
  SystemChatMessage,
} from "langchain/schema";
import { PineconeStore } from "langchain/vectorstores/pinecone";
import z from "zod";

const ChatSchema = z.object({
  messages: z.array(
    z.object({
      role: z.enum(["system", "user", "assistant"]),
      content: z.string(),
      id: z.string().optional(),
      createdAt: z.date().optional(),
    })
  ),
});

export const runtime = "edge";

let pinecone: PineconeClient | null = null;

const initPineconeClient = async () => {
  pinecone = new PineconeClient();
  await pinecone.init({
    apiKey: process.env.PINECONE_API_KEY!,
    environment: process.env.PINECONE_ENVIRONMENT!,
  });
};

export async function POST(req: Request) {
  const body = await req.json();

  try {
    const { messages } = ChatSchema.parse(body);

    if (pinecone == null) {
      await initPineconeClient();
    }

    const pineconeIndex = pinecone!.Index(process.env.PINECONE_INDEX_NAME!);

    const vectorStore = await PineconeStore.fromExistingIndex(
      new OpenAIEmbeddings(),
      { pineconeIndex }
    );

    const pastMessages = messages.map((m) => {
      if (m.role === "user") {
        return new HumanChatMessage(m.content);
      }
      if (m.role === "system") {
        return new SystemChatMessage(m.content);
      }
      return new AIChatMessage(m.content);
    });

    const { stream, handlers } = LangChainStream();

    const model = new ChatOpenAI({
      streaming: true,
    });

    const questionModel = new ChatOpenAI({});

    const chain = ConversationalRetrievalQAChain.fromLLM(
      model,
      vectorStore.asRetriever(),
      {
        verbose: true,
        questionGeneratorChainOptions: {
          llm: questionModel,
        },
      }
    );

    const question = messages[messages.length - 1].content;

    chain
      .call(
        {
          question,
          chat_history: pastMessages,
        },
        [handlers]
      )
      .catch((e) => {
        console.error(e.message);
      });

    return new StreamingTextResponse(stream);
  } catch (error) {
    if (error instanceof z.ZodError) {
      return new Response(JSON.stringify(error.issues), { status: 422 });
    }

    return new Response(null, { status: 500 });
  }
}
jaocfilho commented 1 year ago

@aranlucas I tried to implement a chat history like you did(and many other ways) and none of them seemed to work. Is your chat history working properly?

carlosveloso commented 1 year ago

I'm using the BufferMemory like this for the ConversationalRetrievalQAChain, and the history seems to work correctly:

  createChatHistory = (messages: any[]) => {
    const lcChatMessageHistory = new ChatMessageHistory(
      this.mapOpenAiMessagesToChatMessages(messages),
    );
    return lcChatMessageHistory;
  };

  getMemory = (messages: any[]) => {
    const chatHistory = this.createChatHistory(messages);
    const memory = new BufferMemory({
      memoryKey: 'chat_history',
      inputKey: 'chat_history',
      outputKey: 'chat_history',
      returnMessages: true,
      chatHistory: chatHistory, //pass it here
    });
    return memory;
  };

  mapOpenAiMessagesToChatMessages(messages: Message[]): BaseChatMessage[] {
    return messages.map((message) => {
      switch (message.role) {
        case 'user':
          return new HumanChatMessage(message.content);
        case 'assistant':
          return new AIChatMessage(message.content);
        case 'system':
          return new SystemChatMessage(message.content);
        default:
          return new AIChatMessage(message.content);
      }
    });
  }

Just make sure you are using the right prompt and not forgetting the {chat_history} tag. Here is the example that I'm using to test it out: https://js.langchain.com/docs/modules/chains/index_related_chains/conversational_retrieval#prompt-customization

aranlucas commented 1 year ago

@jaocfilho can you explain what you mean by not working correctly? How are you testing?

Is the chat history not being passed, or are you getting a weird answer?

If it isn't working as you're expecting, you can potentially work on the prompt or change temperature of the question LLM.

jaocfilho commented 1 year ago

I'm trying to make the chain remember the last question I asked it.

On the example given by the vercel/ai docs, using the vanilla ChatOpenAI, it currectly remembers my chat history, so if I aske something like "What was my last question" or "What was my first question", it gives me the correct answer.

However, when I do the same questions to a chain with ConversationalRetrievalQAChain, it says it is incapable to remember past messages.

aranlucas commented 1 year ago

This is a limitation of the ConversationalRetrievalQAChain. It focuses more on the document rather than previous chat history. In other words, the behavior you're seeing is expected, but it can be customized.

By default, the only input to the QA chain is the standalone question generated from the question generation chain. This poses a challenge when asking meta questions about information in previous interactions from the chat history.

Take a look at https://js.langchain.com/docs/modules/chains/index_related_chains/conversational_retrieval#prompt-customization for customizing the prompt to take into account previous messages.

Keep in mind that adding more context to the prompt in this way may distract the LLM from other relevant retrieved information.

brerdem commented 1 year ago

Based on posts here, I've managed the chain to work, at least verbose output show the correct response. However, I can't display AI responses, only human messages. Here's my code. Is there something I'm missing?

import { pinecone } from "@/lib/pinecone-client";
import { LangChainStream, Message, StreamingTextResponse } from "ai";
import { ConversationalRetrievalQAChain } from "langchain/chains";
import { ChatOpenAI } from "langchain/chat_models/openai";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { BufferMemory, ChatMessageHistory } from "langchain/memory";
import {
    AIChatMessage,
    HumanChatMessage,
    SystemChatMessage
} from "langchain/schema";
import { PineconeStore } from "langchain/vectorstores/pinecone";

//export const runtime = "edge";

export async function POST(req: Request) {
  const { messages } = await req.json();
  console.log("messages ->", JSON.stringify(messages, null, 2));

  const { stream, handlers } = LangChainStream();

  const llm = new ChatOpenAI({
    //modelName: "gpt-3.5-turbo-0301",
    streaming: true,
    openAIApiKey: process.env.OPENAI_API_KEY,
    temperature: 0,
    callbacks: [handlers],
    //verbose: true,
  });

  const chatHistory = new ChatMessageHistory(
    messages.map((m: Message) => {
      if (m.role === "user") {
        return new HumanChatMessage(m.content);
      }
      if (m.role === "system") {
        return new SystemChatMessage(m.content);
      }
      return new AIChatMessage(m.content);
    })
  );

  const index = pinecone.Index("pdf-test");

  /* create vectorstore*/
  const vectorStore = await PineconeStore.fromExistingIndex(
    new OpenAIEmbeddings(),
    {
      pineconeIndex: index,
      textKey: "text",
    }
  );

  const chain = ConversationalRetrievalQAChain.fromLLM(
    llm,
    vectorStore.asRetriever(),
    {
      verbose: true,
      returnSourceDocuments: false,
      callbacks: [handlers],
      memory: new BufferMemory({
        memoryKey: "chat_history",
        humanPrefix:
          "You are a good assistant that answers question based on the document info you have. If you don't have any information just say I don't know. Answer question with the same language of the question",
        inputKey: "question", // The key for the input to the chain
        outputKey: "text",
        returnMessages: true, // If using with a chat model
        chatHistory: chatHistory,
      }),
    }
  );

  const lastMessage: Message = (messages as Message[]).at(-1)!;
  await chain.call({ question: lastMessage.content }).catch(console.error);

  return new StreamingTextResponse(stream);
}
aranlucas commented 1 year ago

There's a couple of things wrong with the code you posted. Take a look at the code above to see how to implement it properly.

A couple of things

  1. Don't use handlers on the LLM (https://js.langchain.com/docs/production/callbacks/#constructor-callbacks), but on the call method (https://js.langchain.com/docs/production/callbacks/#request-callbacks). Internally ConversationalRetrievalQAChain calls the LLM twice, and if you do it on the "Constructor" then it will close the stream before the chain is completed.
  2. Do not await on the chain.call. This will disable streaming.
  3. You should have a streaming LLM and a question LLM, otherwise you'll stream the result of the questionLLM rephrasing the question.
brerdem commented 1 year ago

Thank you very much @aranlucas it worked. Though I didn't get why we need an extra LLM for questions. I thought it should suffice a single LLM for only responses

I also noticed a strange thing. If you provide chatHistory to BufferMemory via LangChain's ChatHistory function it always switch to English no matter in which language the questions asked. But if you use it with ConversationalRetrievalQAChain.getChatHistoryString, it preserves the language. Or better not to use it all and just provide memoryKey, it works too

brerdem commented 1 year ago

Do you guys know how to initiate chat with a welcoming / onboarding message from AI? I've managed to do it first appending a hidden human message to messages at the beginning but is there a more practical way to do it?

vpatel85 commented 1 year ago

@brerdem You can use the initialInput initialMessages param if you are using useChat from ai/react

Does anyone know how to display sources metadata on the FE once the stream is complete?

justinlettau commented 1 year ago

@vpatel85 Hopefully there is a better (more official) way, but the following worked for me (short term):

  1. On the back-end, make a copy of LangChainStream.
  2. Adjust the handleChainEnd method like this:
handleChainEnd: async (_outputs: any, runId: string) => {
+  const docs = _outputs['sourceDocuments'] as Document[] | undefined;
+
+  if (docs != null) {
+     const meta = JSON.stringify(docs.map((x) => x.metadata));
+     await writer.write(`\n##SOURCE_DOCUMENTS##${meta}`);
+  }

  await handleEnd(runId);
},
  1. On the front-end, split the streamed message and parse the source documents. Something like this:
    
    import type { Message } from 'ai';

interface DocumentMetadata { source: string; page: number; }

export function parseMessage({ content }: Message) { let answer = content; let documents: DocumentMetadata[] = [];

const separator = '##SOURCE_DOCUMENTS##'; const index = content.indexOf(separator);

if (index !== -1) { const start = index + separator.length; const meta = content.substring(start);

answer = answer.substring(0, index);

try {
  documents = JSON.parse(meta);
} catch (error) {
  // do nothing
}

}

return { answer, documents, }; }


4. Instead of rendering `message.content` directly, use the parsed values:
```tsx
import type { Message } from 'ai';

export interface MyMessageProps {
  message: Message;
}

export function MyMessage({ message }: MyMessageProps) {
  const { answer, documents } = parseDocuments(message);

  return (
    <div>
      <div>{answer}</div>
      <ul>
        {documents.map(x => (<li>{x.source}, page {x.page}</li>))}
      </ul>
    </div>
  );
}
vpatel85 commented 1 year ago

Thank you @justinlettau that worked perfectly! I really appreciate the help! 💯

I was originally hoping to avoid overriding the whole function and just override the handler on the LangChainStream() by doing something like below but couldn't figure out a good way to pass the stream and the additional content to the FE. I looked into streamToResponse as well but it wasn't really what I needed.

handlers.handleChainEnd = async (response) => {
    if (response.sourceDocuments) {
        response.sourceDocuments.map((doc) => {
            console.log(doc.metadata)
        })
    }
}
aranlucas commented 1 year ago

https://github.com/vercel-labs/ai-chatbot/issues/82

Is also an interesting use case that requires a custom LangChainStream. It might be worth updating LangChainStream to allow "hooks" to fit these different use cases

brerdem commented 1 year ago

@brerdem You can use the ~initialInput~ initialMessages param if you are using useChat from ai/react

Does anyone know how to display sources metadata on the FE once the stream is complete?

@vpatel85 thanks, but none of these initialX methods worked because they don't trigger a request. However append function from useChat does. I ended up appending an initial system message at the beginning when messages are empty.

lucasquinteiro commented 1 year ago

@justinlettau @vpatel85 This is the approach I took without having to modify LangChainStream. After all the tokens are streamed and before calling handler.handleChainEnd() we can call handler.handleLLMNewToken like this:

In api/chat/route.ts:

...
const chain = ConversationalRetrievalQAChain.fromLLM(
    llm,
    vectorDB.asRetriever(),
    {
      returnSourceDocuments: true,
      ...theRestOfYourConfig
    }
  );

const stringifySources = (docs: Document[] | undefined) => {
    if (docs) {
      const stringifiedSources = JSON.stringify(docs.map((x) => x.metadata));
      return stringifiedSources;
    }
    return "";
  };

chain
    .call({ question, runId })
    .then((response) => {
      sources = stringifySources(response.sourceDocuments);
      return response;
    })
    .catch(console.error)
    .finally(async () => {
      console.log("sources", sources);
      if (sources) {
        await handlers.handleLLMNewToken(`##SOURCE_DOCUMENTS##${sources}`);
      }

      handlers.handleChainEnd();
    });

   return new StreamingTextResponse(stream);

I am only showing the chain part, let me know if you need more context. I am taking the same approach as you did returning the sources as a string and parsing them on the frontend.

DanielhCarranza commented 1 year ago

@justinlettau @vpatel85 This is the approach I took without having to modify LangChainStream. After all the tokens are streamed and before calling handler.handleChainEnd() we can call handler.handleLLMNewToken like this:

In api/chat/route.ts:

...
const chain = ConversationalRetrievalQAChain.fromLLM(
    llm,
    vectorDB.asRetriever(),
    {
      returnSourceDocuments: true,
      ...theRestOfYourConfig
    }
  );

const stringifySources = (docs: Document[] | undefined) => {
    if (docs) {
      const stringifiedSources = JSON.stringify(docs.map((x) => x.metadata));
      return stringifiedSources;
    }
    return "";
  };

chain
    .call({ question, runId })
    .then((response) => {
      sources = stringifySources(response.sourceDocuments);
      return response;
    })
    .catch(console.error)
    .finally(async () => {
      console.log("sources", sources);
      if (sources) {
        await handlers.handleLLMNewToken(`##SOURCE_DOCUMENTS##${sources}`);
      }

      handlers.handleChainEnd();
    });

   return new StreamingTextResponse(stream);

I am only showing the chain part, let me know if you need more context. I am taking the same approach as you did returning the sources as a string and parsing them on the frontend.

Can you share the entire file? Where did you got the runId and where you declared the sources? I got an error from handlers.handleChainEnd()

Expected 2 arguments, but got 0.ts(2554)
index.d.ts(106, 26): An argument for '_outputs' was not provided.
(property) handleChainEnd: (_outputs: any, runId: string) => Promise<void>
anhhtca commented 1 year ago

quoted...

Hi,

I am not using the api in NextJS directly. Instead, I have built an external API at backend.com/api/chat. Here is the code:

import { ChatOpenAI } from 'langchain/chat_models/openai';
import { VectorDBQAChain } from 'langchain/chains';
import { HNSWLib } from 'langchain/vectorstores/hnswlib';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { ConfigurationParameters } from 'openai';

const configuarion: ConfigurationParameters = {
    apiKey: process.env.OPENAI_API_KEY,
}

export async function postTest(req: Request, res: Response) {
    let streamedResponse = '';
    const embedding = new OpenAIEmbeddings({}, configuarion)
    const vectorStore = await HNSWLib.load(`${baseDirectoryPath}/documents/index/data/`, embedding);

    const streamingModel = new ChatOpenAI({
        streaming: true,
        callbacks: [{
            handleLLMNewToken(token) {
                streamedResponse += token;
            }
        }]
    });

    const chain = VectorDBQAChain.fromLLM(streamingModel, vectorStore);

    const question = "What did the president say about Justice Breyer?";

    chain.call({ query: question });

    console.log({ streamedResponse });

    res.status(200).send({ streamedResponse });
}

My NextJS front-end, which uses vercel-labs/ai-chatbot, receives the streamedResponse but doesn't display it in a streaming manner on the chat UI.

I have tried searching for a solution, but I haven't had any luck. I hope you can assist me. Thank you so much!

lucasquinteiro commented 1 year ago

Hi @anhhtca, even though you are not using Next.js API routes, you can use the helpers from vercel-labs/ai package to handle the streams so that they are displayed in a streaming manner on the chat UI.

Here is an example of my implementation using vercel-labs/ai package and langchain.

First, a couple of functions to build the chat memory based on the previous messages.

import {
  BaseChatMessage,
  AIChatMessage,
  HumanChatMessage,
} from "langchain/schema";
import { BufferMemory, ChatMessageHistory } from "langchain/memory";

type ChatMessage = {
  role: "user" | "assistant";
  content: string;
};

const getChatMessages = (history: ChatMessage[]): BaseChatMessage[] => {
  return history.map((message) =>
    message.role === "user"
      ? new HumanChatMessage(message.content)
      : new AIChatMessage(message.content)
  );
};

const extractLastQuestion = (messages: ChatMessage[]) => {
  const question =
    messages.length > 0 ? messages[messages.length - 1].content : "";
  const previousMessages = messages.slice(0, messages.length - 1);

  return { question, previousMessages };
};

const getMemory = (messages: ChatMessage[]) => {
  const { question, previousMessages } = extractLastQuestion(messages);

  const messageHistory = getChatMessages(previousMessages);

  const memory = new BufferMemory({
    memoryKey: "chat_history",
    inputKey: "input",
    outputKey: "output",
    chatHistory: new ChatMessageHistory(messageHistory),
  });

  return { memory, question };
};

export { getMemory };

The /chat route. This is the implementation I did using Next.js API routes, but it should work with any node.js backend:

export async function POST(req: NextRequest) {
  const body = await req.json();
  const { messages } = body;
  const { stream, handlers } = LangChainStream();
  const { memory, question } = getMemory(messages); // TODO: handle memory through a database like redis
  const vectorStore = await initializeVectorStore(); // you can use any Langchain vector store here

  const llm = new ChatOpenAI({
    streaming: true,
    temperature: 0,
    callbacks: [handlers],
    // verbose: true,
  });

  const nonStreamingModel = new ChatOpenAI({
    // verbose: true
  }); // you can use a different LLM here

// Customize the prompt to your needs
  const template = `You are a chatbot helping customers with their questions.

  {context}

  Human: {question}
  Assistant:
  `;

  const prompt = new PromptTemplate({
    template,
    inputVariables: ["question", "context"],
  });

 // You can also customize the condense_question_template
  const CONDENSE_QUESTION_TEMPLATE = `Given the following conversation and a follow up input, if it is a question rephrase it to be a standalone question.
  If it is not a question, just summarize the message.

  Chat history:
  {chat_history}
  Follow up input: {question}
  Standalone input:
  `;

  const chain = ConversationalRetrievalQAChain.fromLLM(
    llm,
    vectorStore.asRetriever(),
    {
      memory,
      returnSourceDocuments: true,
      qaChainOptions: {
        type: "stuff",
        prompt: prompt,
      },
      questionGeneratorChainOptions: {
        llm: nonStreamingModel,
        template: CONDENSE_QUESTION_TEMPLATE
      },
    }
  );

  chain.call({ question });

 // This what you need to return from your endpoint to make stream the answer to the frontend
  return new StreamingTextResponse(stream);
}

Let me know if this works for you.

EmilioJD commented 1 year ago

@justinlettau @vpatel85 This is the approach I took without having to modify LangChainStream. After all the tokens are streamed and before calling handler.handleChainEnd() we can call handler.handleLLMNewToken like this:

In api/chat/route.ts:

...
const chain = ConversationalRetrievalQAChain.fromLLM(
    llm,
    vectorDB.asRetriever(),
    {
      returnSourceDocuments: true,
      ...theRestOfYourConfig
    }
  );

const stringifySources = (docs: Document[] | undefined) => {
    if (docs) {
      const stringifiedSources = JSON.stringify(docs.map((x) => x.metadata));
      return stringifiedSources;
    }
    return "";
  };

chain
    .call({ question, runId })
    .then((response) => {
      sources = stringifySources(response.sourceDocuments);
      return response;
    })
    .catch(console.error)
    .finally(async () => {
      console.log("sources", sources);
      if (sources) {
        await handlers.handleLLMNewToken(`##SOURCE_DOCUMENTS##${sources}`);
      }

      handlers.handleChainEnd();
    });

   return new StreamingTextResponse(stream);

I am only showing the chain part, let me know if you need more context. I am taking the same approach as you did returning the sources as a string and parsing them on the frontend.

I haven't been able to figure out how to pass both the SOURCE DOCUMENTS and the stream to the frontend, I also get the same error as @DanielhCarranza

Expected 2 arguments, but got 0.ts(2554) index.d.ts(106, 26): An argument for '_outputs' was not provided. (property) handleChainEnd: (_outputs: any, runId: string) => Promise<void>

Super stuck on this, would be useful if you included more context for how you parse in the frontend @lucasquinteiro and what you do on the backend with runId and outputs. Thank you!

lucasquinteiro commented 1 year ago

Hi @EmilioJD @DanielhCarranza sorry for the late response. You are right, at the moment I posted the comment I was using an older version of the package. Later when I upgraded it I got the same errors as you.

This is the workaround I applied:

  1. Not passing the handleChainEnd handler to ChatOpenAI, so that I can manage when I want the chain to end.
  2. Overriding handleLLMStart handler just to get the id of the chain so that I can end it later.
  3. Getting the sources from the chain.call promise.
  4. Streaming the sources as an string through handleLLMNewToken handler.
  5. Calling handleChainEnd with the id.
  6. In the frontend I wrote a function to extract the sources from the streamed message. It is basically identifying the ##SOURCE_DOCUMENTS## pattern and extracting the stringified JSON.

Here is the code of the POST:

export async function POST(req: NextRequest) {
  const body = await req.json();
  const { messages } = body;

  const {
    stream,
    handlers: {
      handleChainEnd,
      handleLLMStart,
      handleLLMNewToken,
      handleLLMError,
      handleChainStart,
      handleChainError,
      handleToolStart,
      handleToolError,
    },
  } = LangChainStream();
  const { memory, question } = getMemory(messages); 
  const vectorDB = await getPineconeVectorDB();
  let id = "";

  const handlers = {
    handleLLMStart: (llm: any, prompts: string[], runId: string) => {
      id = runId;
      return handleLLMStart(llm, prompts, runId);
    },
    handleLLMNewToken,
    handleLLMError,
    handleChainStart,
    handleChainError,
    handleToolStart,
    handleToolError,
  };

  const llm = new ChatOpenAI({
    streaming: true,
    temperature: 0,
    callbacks: [handlers],
  });
  const nonStreamingModel = new ChatOpenAI({});

  const chain = ConversationalRetrievalQAChain.fromLLM(
    llm,
    vectorDB.asRetriever(),
    {
      memory,
      returnSourceDocuments: true,
      qaChainOptions: {
        type: "stuff",
        prompt: getPrompt(),
      },
      questionGeneratorChainOptions: {
        llm: nonStreamingModel,
        template: CONDENSE_QUESTION_TEMPLATE,
      },
    }
  );

  chain.call({ question }).then(async (response) => {
    const sources = JSON.stringify(
      response.sourceDocuments.map((document: any) => document.metadata.source)
    );
    await handleLLMNewToken(`##SOURCE_DOCUMENTS##${sources}`);
    await handleChainEnd(null, id);
  });

  return new StreamingTextResponse(stream);
}

You can find the definition of the getMemory function in my comment above.

Also, getPrompt() basically returns a PromptTemplate with {context} and {question} as inputVariables.

There is probably an easier way to achieve this, but this is what I applied at the time without having to modify the code of the package internally. Hope this helps! Let me know if you have any doubts.

EmilioJD commented 1 year ago

@lucasquinteiro That worked perfectly! Thank you so much. Was stuck on this for a while.

Typhon0130 commented 1 year ago

ianmcfall Thank you for your code But my code doesn't accept the property called MEMORY. image

what's the error? :(

EmilioJD commented 1 year ago

ianmcfall Thank you for your code But my code doesn't accept the property called MEMORY. image

what's the error? :(

Screenshot 2023-08-08 at 9 53 39 AM

@ianmcfall Not able to reproduce that error on my end. Any chance you can hover over memory and see what it says? Also, are you on the latest version of langchain?

Typhon0130 commented 1 year ago

Thank you for your reply I have just fixed it. But I have one problem As you know this chat bot can answer only base don specific data. So it can't answer for my custom question based on chat history. For example I: Hi my name is Honzo AI: Hi Honzo How can I assist you today? I: what's my name? AI: Your name is Richard (This data is specific dat that I used for training chatbot) How can I fix it?

ianmcfall commented 1 year ago

@Typhon0130 Glad you could fix it. I'm still racking my brain to get a conversational memory in too based on the conversation - I've been stumped for a few weeks.

Seems to be a very common limitation/challenge that people are facing with ConversationalRetrievalQAChain

jacoblee93 commented 1 year ago

Hey @Typhon0130 and @ianmcfall!

That's a common one - has to do with how the user's question gets dereferenced and rephrased as a "standalone" question for vector store queries.

We have a newer, more experimental approach using an retrieval-focused agent you can check out here that handles meta-questions better:

https://js.langchain.com/docs/use_cases/question_answering/conversational_retrieval_agents

And some folks have had some success with customizing the prompt (although there are tradeoffs there):

https://js.langchain.com/docs/modules/chains/popular/chat_vector_db#prompt-customization

We're also working on a template showing off a few of these common use-cases specifically in Next.js - will update here when it's ready!

justinlettau commented 1 year ago

@vpatel85 @lucasquinteiro Here is a fun way to do some custom logic in a single handler without copying/overwriting the whole function 🤓

import { LangChainStream, StreamingTextResponse } from 'ai';

const { stream, handlers } = LangChainStream();

const originalChainEnd = handlers.handleChainEnd;
const overrideChainEnd = async (_outputs: any, runId: string) => {
  const docs = _outputs['sourceDocuments'] as Document[] | undefined;

  if (docs != null) {
    const meta = JSON.stringify(docs.map((x) => x.metadata));
    await handlers.handleLLMNewToken(`\n##SOURCE_DOCUMENTS##${meta}`);
  }

  return originalChainEnd(_outputs, runId);
};

handlers.handleChainEnd = overrideChainEnd;

const callbacks = CallbackManager.fromHandlers(handlers);
// use callbacks like normal ...
jacoblee93 commented 1 year ago

Some more polish on docs to go but here's a new official template with some example use cases around retrieval using some of the latest features and techniques, including streaming and retrieval focused agents!

https://github.com/langchain-ai/langchain-nextjs-template

Typhon0130 commented 1 year ago

Thank you

Typhon0130 commented 1 year ago

And I have one question. What is the best choice for chunkSize, chunkOverlap when I spliit doc. This docs is about that someone's resume. This is my code const text_splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200, }); But Embedding quality is not good :( How can I fix it?

jacoblee93 commented 1 year ago

There's not one answer to that - it'll take experimentation. For a resume, I would tend towards smaller values (100?) since it's very information dense.

Typhon0130 commented 1 year ago

There's not one answer to that - it'll take experimentation. For a resume, I would tend towards smaller values (100?) since it's very information dense.

so you mean chunkSize is 100? 🤔

jacoblee93 commented 1 year ago

There's not one answer to that - it'll take experimentation. For a resume, I would tend towards smaller values (100?) since it's very information dense.

so you mean chunkSize is 100? 🤔

You can try out different values here:

https://share.streamlit.io/app/langchain-text-splitter/