Closed andrewrynhard closed 1 year ago
Answer generated by a 🤖
From your description, it appears that the issue lies in the JavaScript version of your code where the context is not being populated correctly. This is likely due to the fact that you're not storing the filtered documents after performing a similarity search.
To address this, you should modify your JavaScript code to store the filtered documents in a variable, similar to what you've done in your Python code. Here's how you can do it:
// Search
const question = "How do I configure a VLAN?"
const relevantDocs = await vectorStore.similaritySearchWithScore(question, 4);
const filteredDocs = relevantDocs.filter((_, score) => score < 1.3)
// Run
const template = `
You are an AI assistant that answers questions only about Talos Linux.
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
In addition to giving an answer, also include a brief explanation of the fields used in the configuration in bullet points.
Begin!
Context:
---------
{context}
---------
Question: {question}
Helpful Answer:`;
const prompt = new PromptTemplate({
template: template,
inputVariables: ["context", "question"]
})
const llm = new OpenAI({ modelName: "gpt-3.5-turbo", temperature: 0 });
const chain = loadQAStuffChain(llm, { prompt: prompt, verbose: true });
const res = await chain.call({
input_documents: filteredDocs,
question: question
});
console.log(res);
In this revised code, the filtered documents are stored in the filteredDocs
variable, which is then used as the input for the chain.call()
function. This should ensure that the context is correctly populated with the relevant documents.
Please note that this is one potential solution based on the information provided. If this doesn't resolve your issue, it might be worth checking other parts of your code to ensure they are functioning as expected.
If you find this solution to be correct and you believe this is a bug that could impact other users, we encourage you to make a pull request.
If the problem persists even after trying this solution, it might be helpful to ensure you're using the latest version of LangChain. Sometimes, issues can be resolved simply by updating to the latest version.
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
Updated the code with a fix for what the bot found. This was a mistake on my part by adding it just before reporting, but it isn't a fix.
Here are the reported input_documents
:
"input_documents": [
[
{
"pageContent": "Configure a VLAN.",
"metadata": {
"filetype": "text/markdown",
"page_number": 1,
"filename": "vlan.md",
"category": "Title",
"loc": {
"lines": {
"from": 1,
"to": 1
}
}
}
},
0.9523001491866929
],
[
{
"pageContent": "VLAN.",
"metadata": {
"filetype": "text/markdown",
"page_number": 1,
"filename": "vlan.md",
"category": "UncategorizedText",
"loc": {
"lines": {
"from": 1,
"to": 1
}
}
}
},
0.8790187749908752
]
],
"question": "How do I configure a VLAN?"
}
Answer generated by a 🤖
I understand that you're having trouble with the LangChain Javascript/Typescript framework. Specifically, the context is not being populated correctly when you're trying to implement a retrieval QA from documentation written in markdown. From your code, it appears that the input_documents
are being passed directly to the call
method of the chain
object, but the pageContent
from each document, which should form the context, is not being extracted.
Here's how you can modify your code to extract the pageContent
from each document and join them together to form the context:
// Extract pageContent from each document and join them together to form the context
const context = filteredDocs.map(doc => doc[0].pageContent).join('\n');
const res = await chain.call({
context: context,
question: question
});
This modification should ensure that the context is populated correctly with the content of the documents.
Additionally, if you have a chat history that you want to include in the context, you can pass it as a string to the call
method:
const res = await chain.call({
context: context,
question: question,
chat_history: "your chat history here"
});
This will include the chat history in the context when generating the response.
Please note that the ConversationalRetrievalQAChain
class from the LangChain framework, which you're using, has an inputKey
property set to "question" and a chatHistoryKey
property set to "chat_history". These properties are used as keys when passing inputs to the call
method. Therefore, ensure that the keys in your input object match these properties.
This solution should address your issue. However, if you believe this is a bug that could impact other users, feel free to make a pull request with your proposed changes. If the issue persists after trying this solution, I would recommend checking if there's a newer version of LangChain available that might have addressed this issue.
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
Passing in the context
explicitly doesn't seem to solve this issue. I notice, however, that the pageContent
from the debug output for each file is only a line or two of the original document. It seems something may be off with how I am loading the documents.
Ah! So it seems MemoryVectorStore
has a bug. The docs have the original pageContent
before being loaded up by MemoryVectorStore.fromDocuments
. After loaded in they get truncated.
Ah! So it seems
MemoryVectorStore
has a bug. The docs have the originalpageContent
before being loaded up byMemoryVectorStore.fromDocuments
. After loaded in they get truncated.
Maybe not. Seeing the same issue with FaissStore
.
After moving to the Notion
loader and looking more closely at the debug output, It seems that the pageContent
is lost somewhere between [chain/start] [1:chain:StuffDocumentsChain]
and [chain/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain]
:
[chain/start] [1:chain:StuffDocumentsChain] Entering Chain run with input: {
"input_documents": [
[
{
"pageContent": "<!-- THIS FILE IS AUTOGENERATED. DO NOT EDIT! -->\n\nVLAN.\n\nConfigure a VLAN.\n\n## Example\n\n```yaml\nmachine:\n network:\n interfaces:\n - interface: eth0\n vlans:\n - vlanId: 1a2b3c\n addresses:\n - 192.168.1.2\n metric: 1024\n mtu: 1500\n```\n\n\n## References",
"metadata": {
"source": "/docs/vlan.md",
"loc": {
"lines": {
"from": 1,
"to": 23
}
}
}
},
0.827355178324773
],
[
{
"pageContent": "<!-- THIS FILE IS AUTOGENERATED. DO NOT EDIT! -->\n\nDHCP configuration.\n\nEnable or disable DHCP. Configure DHCP options.\n\n## Example\n\n```yaml\nmachine:\n network:\n interfaces:\n - interface: eth0\n dhcp: true\n dhcpOptions:\n \trouteMetric: 100\n ipv4: true\n ipv6: true\n duidv6: 00-01-00-01-17-96-F9-3A-28-92-4A-3F-6C-47\n```\n\n\n## References",
"metadata": {
"source": "/docs/dhcp.md",
"loc": {
"lines": {
"from": 1,
"to": 23
}
}
}
},
0.7615031007043638
]
],
"question": "How do I configure a VLAN?"
}
[chain/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain] Entering Chain run with input: {
"question": "How do I configure a VLAN?",
"context": "\n\n"
}
[chain/start] [1:chain:LLMChain] Entering Chain run with input: {
"question": "How do I configure a VLAN?",
"context": "\n\n"
}
[llm/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain > 3:llm:OpenAIChat] Entering LLM run with input: {
"prompts": [
"You are an AI assistant that answers questions only about Talos Linux.\n\nUse the following pieces of context to answer the question at the end.\nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nIn addition to giving an answer, also include a brief explanation of the fields used in the configuration in bullet points.\n\nBegin!\n\nContext:\n---------\n\n\n\n---------\nQuestion: How do I configure a VLAN?\nHelpful Answer:"
]
}
[llm/start] [1:chain:LLMChain > 2:llm:OpenAIChat] Entering LLM run with input: {
"prompts": [
"You are an AI assistant that answers questions only about Talos Linux.\n\nUse the following pieces of context to answer the question at the end.\nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nIn addition to giving an answer, also include a brief explanation of the fields used in the configuration in bullet points.\n\nBegin!\n\nContext:\n---------\n\n\n\n---------\nQuestion: How do I configure a VLAN?\nHelpful Answer:"
]
}
Managed to get this working. Either I am using loadQAStuffChain
wrong or there is a bug. Instead of using that I am now using:
const chain = new LLMChain({ llm, prompt });
const context = relevantDocs.map(doc => doc[0].pageContent).join('\n');
const res = await chain.call({
context: context,
question: question
})
I am now getting the expected results.
@andrewrynhard similaritySearchWithScore
returns a tuple, not the docs directly. So when you do:
const relevantDocs = await vectorStore.similaritySearchWithScore(question, 4);
const filteredDocs = relevantDocs.filter((_, score) => score < 1.3);
filteredDocs
is actually an array of [Document, number]
pairs, which you're then passing into the chain.
The proper way would be something like:
const filteredDocs = relevantDocs.filter(([_, score]) => score < 1.3).map((tuple) => tuple[0]);
Hello,
I created a python script to do a retrieval QA from documentation written in markdown. Here is the working program:
I can see in the debug output that the context is populated with the retrieved documents and this outputs a correct answer.
However, with the following equivalent (to the best of my knowledge) JS program I get an unexpected response:
I see the following in the debug output:
Why is the
context
set to\n\n\n\n\n\n"
?