langchain-ai / langchainjs

šŸ¦œšŸ”— Build context-aware reasoning applications šŸ¦œšŸ”—
https://js.langchain.com/docs/
MIT License
12.61k stars 2.16k forks source link

Always getting "undefined" for the source document #1980

Closed hajsf closed 1 year ago

hajsf commented 1 year ago

I'm using langchainJS with openAI for my local documents, and I wrote the below code, the source of the answer I get is always undefined:

import { OpenAI } from "langchain/llms/openai";
import { ConversationalRetrievalQAChain } from "langchain/chains";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { Chroma } from "langchain/vectorstores/chroma";
import { ChatOpenAI } from "langchain/chat_models/openai";
import readline from 'readline';

const rl = readline.createInterface({
    input: process.stdin,
});

/* Initialize the LLM to use to answer the question */
const model = new OpenAI({
    openAIApiKey: Bun.env.OPENAI_API_KEY,
    streaming: true,
    callbacks: [
        {
            handleLLMNewToken(token) {
                process.stdout.write(token.replace(/^\n/, ""));
            },
        },
    ],
});

const GPTchat = new ChatOpenAI({
    openAIApiKey: Bun.env.OPENAI_API_KEY,
    modelName: "gpt-3.5-turbo-0613",
    temperature: 0.9,
});

/* Load the vector database of the embedings */
const vectorStore = await Chroma.fromExistingCollection(
    new OpenAIEmbeddings(),
    { collectionName: "data" }
);

/* Create the chain */
const chain = ConversationalRetrievalQAChain.fromLLM(
    model,
    vectorStore.asRetriever(),
);

async function converse(prompt, chatHistory, senderId) {
    const followUpRes = await chain.call({
        question: prompt,
        chat_history: chatHistory,
        search: false,
        returnSourceDocuments: true
    });
    chatHistory = `${prompt}${followUpRes.text}`;
    console.log(followUpRes);
    return { text: followUpRes.text, source: followUpRes.source };
}

let firstRun = true;
let chatHistory = [];

async function chat() {
    process.stdout.write(firstRun ? "User: " : "\nUser: ");
    firstRun=false;
    const prompt = await new Promise((resolve) => rl.question("", resolve));
    process.stdout.write("AI GPT: ");
    const { text, source } = await converse(prompt, chatHistory, 1);
    if (source) {
        console.log(`\nSource: ${source}`);
    }
    chat(chatHistory);
}

chat();

Sample output: ``bash hajsf@DESKTOP-JS1NVNB:~/wa$ bun qa.js [0.02ms] ".env" User: Hi AI GPT: Hi there! What can I help you with?{ text: " Hi there! What can I help you with?", __run: undefined }

User: When can I take my annual leave AI GPT: You must complete 12 months of service to be entitled to take your annual leave and you must submit your request in the ESS Portal to your department manager at least 3 months earlier.{ text: " You must complete 12 months of service to be entitled to take your annual leave and you must submit your request in the ESS Portal to your department manager at least 3 months earlier.", __run: undefined }

dosubot[bot] commented 1 year ago

Answer generated by a šŸ¤–

Answer

I understand that you're having trouble accessing the source document from the response of your LangChain application. You're expecting the source document to be available in the source property of the response, but it's coming back as undefined.

The issue seems to be that you're trying to access the source document from the source property, but in the LangChain framework, the source documents are actually returned in the sourceDocuments property.

Here's how you can modify your converse function to return the source documents:

async function converse(prompt, chatHistory, senderId) {
    const followUpRes = await chain.call({
        question: prompt,
        chat_history: chatHistory,
        search: false,
        returnSourceDocuments: true
    });
    chatHistory = `${prompt}${followUpRes.text}`;
    console.log(followUpRes);
    return { text: followUpRes.text, source: followUpRes.sourceDocuments };
}

In your chat function, you should also adjust the console log to handle the source as an array, as sourceDocuments is an array of source documents:

const { text, source } = await converse(prompt, chatHistory, 1);
if (source && source.length > 0) {
    console.log(`\nSource: ${source[0].pageContent}`);
}

This will print the pageContent of the first source document. If you want to print all source documents, you can loop over the source array.

This solution should resolve your issue based on the information you've provided. However, if you're still experiencing problems, it might be worth checking if you're using the latest version of LangChain, as this behavior might have been updated in a more recent release.

If you find this solution to be correct and believe this is a common issue that could impact other users, feel free to make a pull request to improve the documentation or the error messaging in the code. Your contribution would be greatly appreciated by the LangChain community.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a šŸ‘ if this is helpful and šŸ‘Ž if it is irrelevant.

jacoblee93 commented 1 year ago

I believe it should be sourceDocuments, not source as the above bot comment states. Can you try that?