langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.39k stars 2.09k forks source link

DOC: Conversational RAG example skips document retrieval, if `chat_history` is not provided #6829

Open anorderh opened 1 week ago

anorderh commented 1 week ago

Checklist

Issue with current documentation:

https://github.com/langchain-ai/langchainjs/blob/666dee7b6519877df094b3ef49cbc6f84078e8bf/docs/core_docs/docs/how_to/qa_chat_history_how_to.ipynb Lines 323 to 341

The code below is from documentation on how to implement conversational RAG at https://js.langchain.com/docs/tutorials/qa_chat_history/#contextualizing-the-question. This specific section is related to managing chat history.

Just for reference:

const contextualizeQSystemPrompt = `Given a chat history and the latest user question
which might reference context in the chat history, formulate a standalone question
which can be understood without the chat history. Do NOT answer the question,
just reformulate it if needed and otherwise return it as is.`;

const contextualizeQPrompt = ChatPromptTemplate.fromMessages([
  ["system", contextualizeQSystemPrompt],
  new MessagesPlaceholder("chat_history"),
  ["human", "{question}"],
]);
const contextualizeQChain = contextualizeQPrompt
  .pipe(llm)
  .pipe(new StringOutputParser());

const qaSystemPrompt = `You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use three sentences maximum and keep the answer concise.

{context}`;

const qaPrompt = ChatPromptTemplate.fromMessages([
  ["system", qaSystemPrompt],
  new MessagesPlaceholder("chat_history"),
  ["human", "{question}"],
]);

The issue:

const contextualizedQuestion = (input: Record<string, unknown>) => {
  if ("chat_history" in input) {
    return contextualizeQChain;
  }
  return input.question;
};

const ragChain = RunnableSequence.from([
  RunnablePassthrough.assign({
    context: async (input: Record<string, unknown>) => {
      if ("chat_history" in input) {
        const chain = contextualizedQuestion(input);
        return chain.pipe(retriever).pipe(formatDocumentsAsString);
      }
      return "";
    },
  }),
  qaPrompt,
  llm
]);

Please refer to the multiple if statements for chat_history and then to the assigning of context inside the RunnableSequence. chain.pipe(retreiver) is pulling Documents to be used for RAG, but these documents are only pulled so long as chat_history is provided.

This seems incorrect, bc even if input provided did not have any chat history, I would still expect it to use RAG and pull documents from our retriever. But here, if there's no history, I assume RAG is not being orchestrated.

Idea or request for content:

Change RunnablePassthrough assignment so that context is atleast assigned the most related documents, even if chat_history is not present.

anorderh commented 1 week ago

This is my attempt for a fix, allowing input.question to be piped and removing the embedded chat_history check.

const contextualizedQuestion = (input: Record<string, unknown>) => {
  if ("chat_history" in input) {
    return contextualizeQChain;
  }
  return new RunnablePassthrough().bind(input.question);
};

const ragChain = RunnableSequence.from([
  RunnablePassthrough.assign({
    context: async (input: Record<string, unknown>) => {
      return contextualizedQuestion(input)
         .pipe(retriever)
         .pipe(formatDocumentsAsString)
    },
  }),
  qaPrompt,
  llm
]);