Closed jaocfilho closed 1 year ago
I'm facing the same challenge
@jaocfilho While it has it's share of bugs I'm still trying to overcome - this is the base vercel-labs/ai-chatbot
with useChat
working with a Pinecone VerctorDB using ConversationalRetrievalQAChain
New to this, so any feedback is greatly appreciated. Cheers.
app/api/chat/route.ts
import { StreamingTextResponse, LangChainStream, Message } from 'ai'
import { ChatOpenAI } from 'langchain/chat_models/openai'
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { ConversationalRetrievalQAChain } from "langchain/chains";
import { BufferMemory } from "langchain/memory";
import { PineconeStore } from "langchain/vectorstores/pinecone";
import { PineconeClient } from "@pinecone-database/pinecone";
import { CallbackManager } from "langchain/callbacks";
export const runtime = 'edge'
export async function POST(req: Request) {
// init req and langchain stream
const json = await req.json()
const { messages } = json
const { stream, handlers } = LangChainStream()
// init chat model
const chat = new ChatOpenAI({
modelName: "gpt-3.5-turbo-0613",
temperature: 0.8,
streaming: true,
callbackManager: CallbackManager.fromHandlers(handlers),
});
// init pinecone vars and log err and exit if not set
const apiKey = process.env.PINECONE_API_KEY;
const environment = process.env.PINECONE_ENVIRONMENT;
const namespace = process.env.PINECONE_NAMESPACE;
const index = process.env.PINECONE_INDEX;
if (!apiKey || !environment || !index || !namespace) {
console.log("Environment variables PINECONE_API_KEY or PINECONE_ENVIRONMENT are not set.");
process.exit(1);
}
// Initialize Pinecone Client and Index
const pinecone = new PineconeClient();
await pinecone.init({
apiKey: apiKey,
environment: environment
});
const pineconeIndex = pinecone.Index(index);
// Initialize Pinecone Vector Store.
const vectorStore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings(
{ batchSize: 100 }
),
{pineconeIndex},
);
// Set the namespace inside Pinecone Vector Store
vectorStore.namespace = namespace;
// Create a chain that uses the OpenAI LLM and Pinecone vector store.
const chain = ConversationalRetrievalQAChain.fromLLM(
chat,
vectorStore.asRetriever(),
{
memory: new BufferMemory({
humanPrefix: "I want you to act as a document that I am having a conversation with. You will provide me with answers from the given info. If the answer is not included, search for an answer and return it. Never break character.",
memoryKey: "chat_history",
inputKey: "question",
returnMessages: true
}),
returnSourceDocuments: false,
verbose: false,
},
);
// call chain with latest user input from messages array, catch errors
chain
.call({ question: json.messages[json.messages.length - 1].content })
.catch(console.error)
// return LangChainStream
return new StreamingTextResponse(stream)
}
This is what I get so far, the document retrieve and answer works fine, but I don't think the history implementation is correct.
import { NextRequest } from 'next/server';
import { StreamingTextResponse, LangChainStream, Message } from 'ai';
import { CallbackManager } from 'langchain/callbacks';
import { ChatOpenAI } from 'langchain/chat_models/openai';
import { AIChatMessage, HumanChatMessage } from 'langchain/schema';
import { ConversationalRetrievalQAChain } from 'langchain/chains';
import { getSupabaseVectorStore } from '@/modules/documents/vector_stores';
type ChatbotChatParams = {
params: { id: string };
};
type ChatApiBodyParams = {
messages: Message[];
};
export const runtime = 'edge';
export async function POST(
request: NextRequest,
{ params }: ChatbotChatParams
) {
const { id: chatbotId } = params;
const { messages }: ChatApiBodyParams = await request.json();
const { stream, handlers } = LangChainStream();
const llm = new ChatOpenAI({
streaming: true,
callbacks: CallbackManager.fromHandlers(handlers),
});
const questionLlm = new ChatOpenAI({});
const vectorStore = getSupabaseVectorStore();
const chatHistory = ConversationalRetrievalQAChain.getChatHistoryString(
messages.map((m) => {
if (m.role == 'user') {
return new HumanChatMessage(m.content);
}
return new AIChatMessage(m.content);
})
);
const chain = ConversationalRetrievalQAChain.fromLLM(
llm,
vectorStore.asRetriever(1, {
essential: { chatbotId },
}),
{
questionGeneratorChainOptions: {
llm: questionLlm,
},
}
);
const question = messages[messages.length - 1].content;
chain
.call({
question,
chat_history: chatHistory,
})
.catch(console.error)
.finally(() => {
handlers.handleChainEnd();
});
return new StreamingTextResponse(stream);
}
Had a similar question earlier (https://github.com/vercel-labs/ai/issues/169). Wasn't very conclusive unfortunately. Still looking into how to use this properly.
Here's my working code that uses ConversationalRetrievalQAChain
From my understanding, you don't need BufferMemory as long as you're passing it in the call method https://js.langchain.com/docs/modules/chains/index_related_chains/conversational_retrieval#externally-managed-memory
You can verify it's working by checking the "verbose: true" when defining the chain, then look for the following prompt:
Note: Still not sure the best way to initialize pineConeClient but it might be worth initializing outside of run scope so that it can be re-used by other instance https://docs.aws.amazon.com/lambda/latest/operatorguide/static-initialization.html
import { PineconeClient } from "@pinecone-database/pinecone";
import { LangChainStream, StreamingTextResponse } from "ai";
import { ConversationalRetrievalQAChain } from "langchain/chains";
import { ChatOpenAI } from "langchain/chat_models/openai";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import {
AIChatMessage,
HumanChatMessage,
SystemChatMessage,
} from "langchain/schema";
import { PineconeStore } from "langchain/vectorstores/pinecone";
import z from "zod";
const ChatSchema = z.object({
messages: z.array(
z.object({
role: z.enum(["system", "user", "assistant"]),
content: z.string(),
id: z.string().optional(),
createdAt: z.date().optional(),
})
),
});
export const runtime = "edge";
let pinecone: PineconeClient | null = null;
const initPineconeClient = async () => {
pinecone = new PineconeClient();
await pinecone.init({
apiKey: process.env.PINECONE_API_KEY!,
environment: process.env.PINECONE_ENVIRONMENT!,
});
};
export async function POST(req: Request) {
const body = await req.json();
try {
const { messages } = ChatSchema.parse(body);
if (pinecone == null) {
await initPineconeClient();
}
const pineconeIndex = pinecone!.Index(process.env.PINECONE_INDEX_NAME!);
const vectorStore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings(),
{ pineconeIndex }
);
const pastMessages = messages.map((m) => {
if (m.role === "user") {
return new HumanChatMessage(m.content);
}
if (m.role === "system") {
return new SystemChatMessage(m.content);
}
return new AIChatMessage(m.content);
});
const { stream, handlers } = LangChainStream();
const model = new ChatOpenAI({
streaming: true,
});
const questionModel = new ChatOpenAI({});
const chain = ConversationalRetrievalQAChain.fromLLM(
model,
vectorStore.asRetriever(),
{
verbose: true,
questionGeneratorChainOptions: {
llm: questionModel,
},
}
);
const question = messages[messages.length - 1].content;
chain
.call(
{
question,
chat_history: pastMessages,
},
[handlers]
)
.catch((e) => {
console.error(e.message);
});
return new StreamingTextResponse(stream);
} catch (error) {
if (error instanceof z.ZodError) {
return new Response(JSON.stringify(error.issues), { status: 422 });
}
return new Response(null, { status: 500 });
}
}
@aranlucas I tried to implement a chat history like you did(and many other ways) and none of them seemed to work. Is your chat history working properly?
I'm using the BufferMemory like this for the ConversationalRetrievalQAChain, and the history seems to work correctly:
createChatHistory = (messages: any[]) => {
const lcChatMessageHistory = new ChatMessageHistory(
this.mapOpenAiMessagesToChatMessages(messages),
);
return lcChatMessageHistory;
};
getMemory = (messages: any[]) => {
const chatHistory = this.createChatHistory(messages);
const memory = new BufferMemory({
memoryKey: 'chat_history',
inputKey: 'chat_history',
outputKey: 'chat_history',
returnMessages: true,
chatHistory: chatHistory, //pass it here
});
return memory;
};
mapOpenAiMessagesToChatMessages(messages: Message[]): BaseChatMessage[] {
return messages.map((message) => {
switch (message.role) {
case 'user':
return new HumanChatMessage(message.content);
case 'assistant':
return new AIChatMessage(message.content);
case 'system':
return new SystemChatMessage(message.content);
default:
return new AIChatMessage(message.content);
}
});
}
Just make sure you are using the right prompt and not forgetting the {chat_history}
tag. Here is the example that I'm using to test it out:
https://js.langchain.com/docs/modules/chains/index_related_chains/conversational_retrieval#prompt-customization
@jaocfilho can you explain what you mean by not working correctly? How are you testing?
Is the chat history not being passed, or are you getting a weird answer?
If it isn't working as you're expecting, you can potentially work on the prompt or change temperature of the question LLM.
I'm trying to make the chain remember the last question I asked it.
On the example given by the vercel/ai docs, using the vanilla ChatOpenAI, it currectly remembers my chat history, so if I aske something like "What was my last question" or "What was my first question", it gives me the correct answer.
However, when I do the same questions to a chain with ConversationalRetrievalQAChain, it says it is incapable to remember past messages.
This is a limitation of the ConversationalRetrievalQAChain. It focuses more on the document rather than previous chat history. In other words, the behavior you're seeing is expected, but it can be customized.
By default, the only input to the QA chain is the standalone question generated from the question generation chain. This poses a challenge when asking meta questions about information in previous interactions from the chat history.
Take a look at https://js.langchain.com/docs/modules/chains/index_related_chains/conversational_retrieval#prompt-customization for customizing the prompt to take into account previous messages.
Keep in mind that adding more context to the prompt in this way may distract the LLM from other relevant retrieved information.
Based on posts here, I've managed the chain to work, at least verbose output show the correct response. However, I can't display AI responses, only human messages. Here's my code. Is there something I'm missing?
import { pinecone } from "@/lib/pinecone-client";
import { LangChainStream, Message, StreamingTextResponse } from "ai";
import { ConversationalRetrievalQAChain } from "langchain/chains";
import { ChatOpenAI } from "langchain/chat_models/openai";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { BufferMemory, ChatMessageHistory } from "langchain/memory";
import {
AIChatMessage,
HumanChatMessage,
SystemChatMessage
} from "langchain/schema";
import { PineconeStore } from "langchain/vectorstores/pinecone";
//export const runtime = "edge";
export async function POST(req: Request) {
const { messages } = await req.json();
console.log("messages ->", JSON.stringify(messages, null, 2));
const { stream, handlers } = LangChainStream();
const llm = new ChatOpenAI({
//modelName: "gpt-3.5-turbo-0301",
streaming: true,
openAIApiKey: process.env.OPENAI_API_KEY,
temperature: 0,
callbacks: [handlers],
//verbose: true,
});
const chatHistory = new ChatMessageHistory(
messages.map((m: Message) => {
if (m.role === "user") {
return new HumanChatMessage(m.content);
}
if (m.role === "system") {
return new SystemChatMessage(m.content);
}
return new AIChatMessage(m.content);
})
);
const index = pinecone.Index("pdf-test");
/* create vectorstore*/
const vectorStore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings(),
{
pineconeIndex: index,
textKey: "text",
}
);
const chain = ConversationalRetrievalQAChain.fromLLM(
llm,
vectorStore.asRetriever(),
{
verbose: true,
returnSourceDocuments: false,
callbacks: [handlers],
memory: new BufferMemory({
memoryKey: "chat_history",
humanPrefix:
"You are a good assistant that answers question based on the document info you have. If you don't have any information just say I don't know. Answer question with the same language of the question",
inputKey: "question", // The key for the input to the chain
outputKey: "text",
returnMessages: true, // If using with a chat model
chatHistory: chatHistory,
}),
}
);
const lastMessage: Message = (messages as Message[]).at(-1)!;
await chain.call({ question: lastMessage.content }).catch(console.error);
return new StreamingTextResponse(stream);
}
There's a couple of things wrong with the code you posted. Take a look at the code above to see how to implement it properly.
A couple of things
call
method (https://js.langchain.com/docs/production/callbacks/#request-callbacks). Internally ConversationalRetrievalQAChain
calls the LLM twice, and if you do it on the "Constructor" then it will close the stream before the chain is completed.chain.call
. This will disable streaming.questionLLM
rephrasing the question.Thank you very much @aranlucas it worked. Though I didn't get why we need an extra LLM for questions. I thought it should suffice a single LLM for only responses
I also noticed a strange thing. If you provide chatHistory to BufferMemory via LangChain's ChatHistory function it always switch to English no matter in which language the questions asked. But if you use it with ConversationalRetrievalQAChain.getChatHistoryString, it preserves the language. Or better not to use it all and just provide memoryKey, it works too
Do you guys know how to initiate chat with a welcoming / onboarding message from AI? I've managed to do it first appending a hidden human message to messages
at the beginning but is there a more practical way to do it?
@brerdem You can use the initialInput
initialMessages
param if you are using useChat
from ai/react
Does anyone know how to display sources metadata on the FE once the stream is complete?
@vpatel85 Hopefully there is a better (more official) way, but the following worked for me (short term):
LangChainStream
.handleChainEnd
method like this:handleChainEnd: async (_outputs: any, runId: string) => {
+ const docs = _outputs['sourceDocuments'] as Document[] | undefined;
+
+ if (docs != null) {
+ const meta = JSON.stringify(docs.map((x) => x.metadata));
+ await writer.write(`\n##SOURCE_DOCUMENTS##${meta}`);
+ }
await handleEnd(runId);
},
import type { Message } from 'ai';
interface DocumentMetadata { source: string; page: number; }
export function parseMessage({ content }: Message) { let answer = content; let documents: DocumentMetadata[] = [];
const separator = '##SOURCE_DOCUMENTS##'; const index = content.indexOf(separator);
if (index !== -1) { const start = index + separator.length; const meta = content.substring(start);
answer = answer.substring(0, index);
try {
documents = JSON.parse(meta);
} catch (error) {
// do nothing
}
}
return { answer, documents, }; }
4. Instead of rendering `message.content` directly, use the parsed values:
```tsx
import type { Message } from 'ai';
export interface MyMessageProps {
message: Message;
}
export function MyMessage({ message }: MyMessageProps) {
const { answer, documents } = parseDocuments(message);
return (
<div>
<div>{answer}</div>
<ul>
{documents.map(x => (<li>{x.source}, page {x.page}</li>))}
</ul>
</div>
);
}
Thank you @justinlettau that worked perfectly! I really appreciate the help! 💯
I was originally hoping to avoid overriding the whole function and just override the handler on the LangChainStream()
by doing something like below but couldn't figure out a good way to pass the stream and the additional content to the FE. I looked into streamToResponse
as well but it wasn't really what I needed.
handlers.handleChainEnd = async (response) => {
if (response.sourceDocuments) {
response.sourceDocuments.map((doc) => {
console.log(doc.metadata)
})
}
}
https://github.com/vercel-labs/ai-chatbot/issues/82
Is also an interesting use case that requires a custom LangChainStream. It might be worth updating LangChainStream to allow "hooks" to fit these different use cases
@brerdem You can use the ~
initialInput
~initialMessages
param if you are usinguseChat
fromai/react
Does anyone know how to display sources metadata on the FE once the stream is complete?
@vpatel85 thanks, but none of these initialX methods worked because they don't trigger a request. However append
function from useChat does. I ended up appending an initial system message at the beginning when messages
are empty.
@justinlettau @vpatel85 This is the approach I took without having to modify LangChainStream. After all the tokens are streamed and before calling handler.handleChainEnd() we can call handler.handleLLMNewToken like this:
In api/chat/route.ts:
...
const chain = ConversationalRetrievalQAChain.fromLLM(
llm,
vectorDB.asRetriever(),
{
returnSourceDocuments: true,
...theRestOfYourConfig
}
);
const stringifySources = (docs: Document[] | undefined) => {
if (docs) {
const stringifiedSources = JSON.stringify(docs.map((x) => x.metadata));
return stringifiedSources;
}
return "";
};
chain
.call({ question, runId })
.then((response) => {
sources = stringifySources(response.sourceDocuments);
return response;
})
.catch(console.error)
.finally(async () => {
console.log("sources", sources);
if (sources) {
await handlers.handleLLMNewToken(`##SOURCE_DOCUMENTS##${sources}`);
}
handlers.handleChainEnd();
});
return new StreamingTextResponse(stream);
I am only showing the chain part, let me know if you need more context. I am taking the same approach as you did returning the sources as a string and parsing them on the frontend.
@justinlettau @vpatel85 This is the approach I took without having to modify LangChainStream. After all the tokens are streamed and before calling handler.handleChainEnd() we can call handler.handleLLMNewToken like this:
In api/chat/route.ts:
... const chain = ConversationalRetrievalQAChain.fromLLM( llm, vectorDB.asRetriever(), { returnSourceDocuments: true, ...theRestOfYourConfig } ); const stringifySources = (docs: Document[] | undefined) => { if (docs) { const stringifiedSources = JSON.stringify(docs.map((x) => x.metadata)); return stringifiedSources; } return ""; }; chain .call({ question, runId }) .then((response) => { sources = stringifySources(response.sourceDocuments); return response; }) .catch(console.error) .finally(async () => { console.log("sources", sources); if (sources) { await handlers.handleLLMNewToken(`##SOURCE_DOCUMENTS##${sources}`); } handlers.handleChainEnd(); }); return new StreamingTextResponse(stream);
I am only showing the chain part, let me know if you need more context. I am taking the same approach as you did returning the sources as a string and parsing them on the frontend.
Can you share the entire file? Where did you got the runId
and where you declared the sources
?
I got an error from handlers.handleChainEnd()
Expected 2 arguments, but got 0.ts(2554)
index.d.ts(106, 26): An argument for '_outputs' was not provided.
(property) handleChainEnd: (_outputs: any, runId: string) => Promise<void>
quoted...
Hi,
I am not using the api
in NextJS directly. Instead, I have built an external API at backend.com/api/chat
. Here is the code:
import { ChatOpenAI } from 'langchain/chat_models/openai';
import { VectorDBQAChain } from 'langchain/chains';
import { HNSWLib } from 'langchain/vectorstores/hnswlib';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { ConfigurationParameters } from 'openai';
const configuarion: ConfigurationParameters = {
apiKey: process.env.OPENAI_API_KEY,
}
export async function postTest(req: Request, res: Response) {
let streamedResponse = '';
const embedding = new OpenAIEmbeddings({}, configuarion)
const vectorStore = await HNSWLib.load(`${baseDirectoryPath}/documents/index/data/`, embedding);
const streamingModel = new ChatOpenAI({
streaming: true,
callbacks: [{
handleLLMNewToken(token) {
streamedResponse += token;
}
}]
});
const chain = VectorDBQAChain.fromLLM(streamingModel, vectorStore);
const question = "What did the president say about Justice Breyer?";
chain.call({ query: question });
console.log({ streamedResponse });
res.status(200).send({ streamedResponse });
}
My NextJS front-end, which uses vercel-labs/ai-chatbot
, receives the streamedResponse
but doesn't display it in a streaming manner on the chat UI.
I have tried searching for a solution, but I haven't had any luck. I hope you can assist me. Thank you so much!
Hi @anhhtca, even though you are not using Next.js API routes, you can use the helpers from vercel-labs/ai package to handle the streams so that they are displayed in a streaming manner on the chat UI.
Here is an example of my implementation using vercel-labs/ai package and langchain.
First, a couple of functions to build the chat memory based on the previous messages.
import {
BaseChatMessage,
AIChatMessage,
HumanChatMessage,
} from "langchain/schema";
import { BufferMemory, ChatMessageHistory } from "langchain/memory";
type ChatMessage = {
role: "user" | "assistant";
content: string;
};
const getChatMessages = (history: ChatMessage[]): BaseChatMessage[] => {
return history.map((message) =>
message.role === "user"
? new HumanChatMessage(message.content)
: new AIChatMessage(message.content)
);
};
const extractLastQuestion = (messages: ChatMessage[]) => {
const question =
messages.length > 0 ? messages[messages.length - 1].content : "";
const previousMessages = messages.slice(0, messages.length - 1);
return { question, previousMessages };
};
const getMemory = (messages: ChatMessage[]) => {
const { question, previousMessages } = extractLastQuestion(messages);
const messageHistory = getChatMessages(previousMessages);
const memory = new BufferMemory({
memoryKey: "chat_history",
inputKey: "input",
outputKey: "output",
chatHistory: new ChatMessageHistory(messageHistory),
});
return { memory, question };
};
export { getMemory };
The /chat route. This is the implementation I did using Next.js API routes, but it should work with any node.js backend:
export async function POST(req: NextRequest) {
const body = await req.json();
const { messages } = body;
const { stream, handlers } = LangChainStream();
const { memory, question } = getMemory(messages); // TODO: handle memory through a database like redis
const vectorStore = await initializeVectorStore(); // you can use any Langchain vector store here
const llm = new ChatOpenAI({
streaming: true,
temperature: 0,
callbacks: [handlers],
// verbose: true,
});
const nonStreamingModel = new ChatOpenAI({
// verbose: true
}); // you can use a different LLM here
// Customize the prompt to your needs
const template = `You are a chatbot helping customers with their questions.
{context}
Human: {question}
Assistant:
`;
const prompt = new PromptTemplate({
template,
inputVariables: ["question", "context"],
});
// You can also customize the condense_question_template
const CONDENSE_QUESTION_TEMPLATE = `Given the following conversation and a follow up input, if it is a question rephrase it to be a standalone question.
If it is not a question, just summarize the message.
Chat history:
{chat_history}
Follow up input: {question}
Standalone input:
`;
const chain = ConversationalRetrievalQAChain.fromLLM(
llm,
vectorStore.asRetriever(),
{
memory,
returnSourceDocuments: true,
qaChainOptions: {
type: "stuff",
prompt: prompt,
},
questionGeneratorChainOptions: {
llm: nonStreamingModel,
template: CONDENSE_QUESTION_TEMPLATE
},
}
);
chain.call({ question });
// This what you need to return from your endpoint to make stream the answer to the frontend
return new StreamingTextResponse(stream);
}
Let me know if this works for you.
@justinlettau @vpatel85 This is the approach I took without having to modify LangChainStream. After all the tokens are streamed and before calling handler.handleChainEnd() we can call handler.handleLLMNewToken like this:
In api/chat/route.ts:
... const chain = ConversationalRetrievalQAChain.fromLLM( llm, vectorDB.asRetriever(), { returnSourceDocuments: true, ...theRestOfYourConfig } ); const stringifySources = (docs: Document[] | undefined) => { if (docs) { const stringifiedSources = JSON.stringify(docs.map((x) => x.metadata)); return stringifiedSources; } return ""; }; chain .call({ question, runId }) .then((response) => { sources = stringifySources(response.sourceDocuments); return response; }) .catch(console.error) .finally(async () => { console.log("sources", sources); if (sources) { await handlers.handleLLMNewToken(`##SOURCE_DOCUMENTS##${sources}`); } handlers.handleChainEnd(); }); return new StreamingTextResponse(stream);
I am only showing the chain part, let me know if you need more context. I am taking the same approach as you did returning the sources as a string and parsing them on the frontend.
I haven't been able to figure out how to pass both the SOURCE DOCUMENTS and the stream to the frontend, I also get the same error as @DanielhCarranza
Expected 2 arguments, but got 0.ts(2554) index.d.ts(106, 26): An argument for '_outputs' was not provided. (property) handleChainEnd: (_outputs: any, runId: string) => Promise<void>
Super stuck on this, would be useful if you included more context for how you parse in the frontend @lucasquinteiro and what you do on the backend with runId and outputs. Thank you!
Hi @EmilioJD @DanielhCarranza sorry for the late response. You are right, at the moment I posted the comment I was using an older version of the package. Later when I upgraded it I got the same errors as you.
This is the workaround I applied:
handleChainEnd
handler to ChatOpenAI, so that I can manage when I want the chain to end.handleLLMStart
handler just to get the id of the chain so that I can end it later.chain.call
promise.handleLLMNewToken
handler.handleChainEnd
with the id.##SOURCE_DOCUMENTS##
pattern and extracting the stringified JSON.Here is the code of the POST:
export async function POST(req: NextRequest) {
const body = await req.json();
const { messages } = body;
const {
stream,
handlers: {
handleChainEnd,
handleLLMStart,
handleLLMNewToken,
handleLLMError,
handleChainStart,
handleChainError,
handleToolStart,
handleToolError,
},
} = LangChainStream();
const { memory, question } = getMemory(messages);
const vectorDB = await getPineconeVectorDB();
let id = "";
const handlers = {
handleLLMStart: (llm: any, prompts: string[], runId: string) => {
id = runId;
return handleLLMStart(llm, prompts, runId);
},
handleLLMNewToken,
handleLLMError,
handleChainStart,
handleChainError,
handleToolStart,
handleToolError,
};
const llm = new ChatOpenAI({
streaming: true,
temperature: 0,
callbacks: [handlers],
});
const nonStreamingModel = new ChatOpenAI({});
const chain = ConversationalRetrievalQAChain.fromLLM(
llm,
vectorDB.asRetriever(),
{
memory,
returnSourceDocuments: true,
qaChainOptions: {
type: "stuff",
prompt: getPrompt(),
},
questionGeneratorChainOptions: {
llm: nonStreamingModel,
template: CONDENSE_QUESTION_TEMPLATE,
},
}
);
chain.call({ question }).then(async (response) => {
const sources = JSON.stringify(
response.sourceDocuments.map((document: any) => document.metadata.source)
);
await handleLLMNewToken(`##SOURCE_DOCUMENTS##${sources}`);
await handleChainEnd(null, id);
});
return new StreamingTextResponse(stream);
}
You can find the definition of the getMemory
function in my comment above.
Also, getPrompt()
basically returns a PromptTemplate with {context}
and {question}
as inputVariables.
There is probably an easier way to achieve this, but this is what I applied at the time without having to modify the code of the package internally. Hope this helps! Let me know if you have any doubts.
@lucasquinteiro That worked perfectly! Thank you so much. Was stuck on this for a while.
ianmcfall Thank you for your code But my code doesn't accept the property called MEMORY.
what's the error? :(
ianmcfall Thank you for your code But my code doesn't accept the property called MEMORY.
what's the error? :(
@ianmcfall Not able to reproduce that error on my end. Any chance you can hover over memory and see what it says? Also, are you on the latest version of langchain?
Thank you for your reply I have just fixed it. But I have one problem As you know this chat bot can answer only base don specific data. So it can't answer for my custom question based on chat history. For example I: Hi my name is Honzo AI: Hi Honzo How can I assist you today? I: what's my name? AI: Your name is Richard (This data is specific dat that I used for training chatbot) How can I fix it?
@Typhon0130 Glad you could fix it. I'm still racking my brain to get a conversational memory in too based on the conversation - I've been stumped for a few weeks.
Seems to be a very common limitation/challenge that people are facing with ConversationalRetrievalQAChain
Hey @Typhon0130 and @ianmcfall!
That's a common one - has to do with how the user's question gets dereferenced and rephrased as a "standalone" question for vector store queries.
We have a newer, more experimental approach using an retrieval-focused agent you can check out here that handles meta-questions better:
https://js.langchain.com/docs/use_cases/question_answering/conversational_retrieval_agents
And some folks have had some success with customizing the prompt (although there are tradeoffs there):
https://js.langchain.com/docs/modules/chains/popular/chat_vector_db#prompt-customization
We're also working on a template showing off a few of these common use-cases specifically in Next.js - will update here when it's ready!
@vpatel85 @lucasquinteiro Here is a fun way to do some custom logic in a single handler without copying/overwriting the whole function 🤓
import { LangChainStream, StreamingTextResponse } from 'ai';
const { stream, handlers } = LangChainStream();
const originalChainEnd = handlers.handleChainEnd;
const overrideChainEnd = async (_outputs: any, runId: string) => {
const docs = _outputs['sourceDocuments'] as Document[] | undefined;
if (docs != null) {
const meta = JSON.stringify(docs.map((x) => x.metadata));
await handlers.handleLLMNewToken(`\n##SOURCE_DOCUMENTS##${meta}`);
}
return originalChainEnd(_outputs, runId);
};
handlers.handleChainEnd = overrideChainEnd;
const callbacks = CallbackManager.fromHandlers(handlers);
// use callbacks like normal ...
Some more polish on docs to go but here's a new official template with some example use cases around retrieval using some of the latest features and techniques, including streaming and retrieval focused agents!
Thank you
And I have one question.
What is the best choice for chunkSize, chunkOverlap when I spliit doc.
This docs is about that someone's resume.
This is my code
const text_splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200, });
But Embedding quality is not good :(
How can I fix it?
There's not one answer to that - it'll take experimentation. For a resume, I would tend towards smaller values (100?) since it's very information dense.
There's not one answer to that - it'll take experimentation. For a resume, I would tend towards smaller values (100?) since it's very information dense.
so you mean chunkSize is 100? 🤔
There's not one answer to that - it'll take experimentation. For a resume, I would tend towards smaller values (100?) since it's very information dense.
so you mean chunkSize is 100? 🤔
You can try out different values here:
I've been trying to implement a chatbot that uses contexts from files.
I already have all the backend necessary to embed my files, but I'm struggling to make the last part work. Have anyone successfully implemented a chain with
useChat
andConversationalRetrievalQAChain
?