langchain-ai / langchainjs

๐Ÿฆœ๐Ÿ”— Build context-aware reasoning applications ๐Ÿฆœ๐Ÿ”—
https://js.langchain.com/docs/
MIT License
12.66k stars 2.17k forks source link

Is it possible to add sourceDocuments to the agent? #4463

Closed morenosoldev closed 5 months ago

morenosoldev commented 8 months ago

Hey i was wondering if there, is a way to pass source documents when using a agent with a createRetrieverTool like this?

`import { ChatOpenAI } from '@langchain/openai' import { Chroma } from '@langchain/community/vectorstores/chroma' import { AgentExecutor } from 'langchain/agents' import { createRetrieverTool } from 'langchain/tools/retriever' import orderCancellationTool from './tools_functions/OrderCancellation' import refundTool from './tools_functions/RefundOrder' import shippingTool from './tools_functions/Shipping' import missingItemsTool from './tools_functions/MissingItems' import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts' import { convertToOpenAIFunction } from '@langchain/core/utils/function_calling' import { RunnableSequence } from '@langchain/core/runnables' import { formatToOpenAIFunctionMessages } from 'langchain/agents/format_scratchpad' import { OpenAIFunctionsAgentOutputParser } from 'langchain/agents/openai/output_parser'

async function createChatbotExecutor( company: string, model: ChatOpenAI, chatbotId: string, conversationId: string, prompt: string, vectorStore: Chroma ) { try { const retriever = vectorStore.asRetriever()

const qaTool = createRetrieverTool(retriever, {
  name: `search-state-${company}-qa`,
  description: `Data omkring ${company} QA - brugbart nรฅr der bliver stillet spรธrgmรฅl relateret til ${company} produkter, services eller lignende.`
})

const tools = [
  qaTool,
  orderCancellationTool(chatbotId, conversationId),
  missingItemsTool(chatbotId, conversationId),
  refundTool,
  shippingTool
]

const MEMORY_KEY = 'chat_history'
const chatPrompt = ChatPromptTemplate.fromMessages([
  ['system', prompt],
  new MessagesPlaceholder(MEMORY_KEY),
  ['user', '{input}'],
  new MessagesPlaceholder('agent_scratchpad')
])

const modelWithFunctions = model.bind({
  functions: tools.map(tool => convertToOpenAIFunction(tool))
})

const runnableAgent = RunnableSequence.from([
  {
    input: i => i.input,
    agent_scratchpad: i => formatToOpenAIFunctionMessages(i.steps),
    chat_history: i => i.chat_history
  },
  chatPrompt,
  modelWithFunctions,
  new OpenAIFunctionsAgentOutputParser()
])

const executor = AgentExecutor.fromAgentAndTools({
  agent: runnableAgent,
  tools
}).withConfig({ runName: 'Agent' })

return executor

} catch (error) { console.error('Error occurred:', error) throw error } }

export default createChatbotExecutor `

dosubot[bot] commented 8 months ago

๐Ÿค–

Hey @andreasjoensson! Great to see you back here. Hope you're doing awesome! ๐Ÿ˜Š

Yes, it is possible to pass source documents when using an agent with a createRetrieverTool. You can use the getRelevantDocuments method from the BaseRetrieverInterface to retrieve the relevant documents based on the provided query. Here is the relevant code:

const func = async (
  { query }: { query: string },
  runManager?: CallbackManagerForToolRun
) => {
  const docs = await retriever.getRelevantDocuments(
    query,
    runManager?.getChild("retriever")
  );
  return formatDocumentsAsString(docs);
};

In this code, getRelevantDocuments is called with the query and a child of the runManager (if it exists). The documents returned by this method are then formatted as a string using formatDocumentsAsString(docs).

You can find more details about this in the langchain/src/tools/retriever.ts file.

Also, there was a similar issue discussed in the langchainjs repository: How to retrieve sourceDocuments with LCEL and RunnableSequence chains, is this the correct approach?. The solution suggested there was to implement custom logic in the _getRelevantDocuments method to ensure that the documents returned are indeed relevant to the query.

I hope this helps! If you have any other questions, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a ๐Ÿ‘ if this is helpful and ๐Ÿ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
morenosoldev commented 8 months ago

@dosu-bot

Okay i have done that but i am still not seing the source documents in the output, here what would you modify to have source documents showing here?

`import { Response } from 'express' import { AIMessage, HumanMessage } from '@langchain/core/messages'

async function processChatbotInput( executor: any, input: string, res: Response, previousMessages: (HumanMessage | AIMessage)[] ): Promise { try { const eventStream = await executor.streamEvents( { input, chat_history: previousMessages }, { version: 'v1' } ) for await (const event of eventStream) { const eventType = event.event if (eventType === 'on_chain_start') { if (event.name === 'Agent') { console.log('\n-----') console.log( Starting agent: ${event.name} with input: ${JSON.stringify( event.data.input )} ) } } else if (eventType === 'on_chain_end') { if (event.name === 'Agent') { console.log('\n-----') console.log(Finished agent: ${event.name}\n) console.log(Agent output was: ${event.data.output}) console.log('event.data', event.data) console.log('\n-----') return event.data.output } } else if (eventType === 'on_llm_stream') { const content = event.data?.chunk?.message?.content if (content !== undefined && content !== '') { console.log(| ${content}) res.write(content) } } else if (eventType === 'on_tool_start') { console.log('\n-----') console.log( Starting tool: ${event.name} with inputs: ${event.data.input} ) } else if (eventType === 'on_tool_end') { console.log('\n-----') console.log(Finished tool: ${event.name}\n) console.log(Tool output was: ${event.data.output}) console.log('\n-----') } }

return ''

} catch (error) { console.error('Error occurred:', error) throw error } }

export default processChatbotInput `

dosubot[bot] commented 8 months ago

๐Ÿค–

Based on the context provided, it seems like you want to display the source documents in the output of your chatbot application. To achieve this, you need to modify the getRelevantDocuments method in your subclass of BaseRetriever to include the source documents in the returned DocumentInterface objects.

Here's an example of how you might modify the getRelevantDocuments method:

class MyRetriever extends BaseRetriever {
  // ...

  async _getRelevantDocuments(
    query: string,
    _callbacks?: CallbackManagerForRetrieverRun
  ): Promise<DocumentInterface[]> {
    // Your code to retrieve documents goes here. This is just an example.
    const documents = await myDocumentRetrievalFunction(query);

    // Modify the DocumentInterface objects to include the source document.
    const documentsWithSource = documents.map(doc => ({
      ...doc,
      sourceDocument: myFunctionToGetSourceDocument(doc),
    }));

    return documentsWithSource;
  }

  // ...
}

In this example, myDocumentRetrievalFunction is a placeholder for your function that retrieves the documents, and myFunctionToGetSourceDocument is a placeholder for your function that retrieves the source document for a given DocumentInterface object. Please replace these placeholders with your actual functions.

The DocumentInterface object in the langchainjs codebase has two properties:

  1. pageContent: This is a string that represents the content of the page.
  2. metadata: This is a generic type that extends a Record of string keys and any value. It represents additional data related to the document.

Here is the structure:

export interface DocumentInterface<
  // eslint-disable-next-line @typescript-eslint/no-explicit-any
  Metadata extends Record<string, any> = Record<string, any>
> {
  pageContent: string;

  metadata: Metadata;
}

This structure is implemented in the Document class, where pageContent is initialized with the pageContent from the fields parameter if it exists, otherwise it uses the existing pageContent. The metadata is initialized with the metadata from the fields parameter if it exists, otherwise it uses an empty object.

You can find more information about this in the langchain-core/src/retrievers.ts and langchain-core/src/documents/document.ts files in the langchainjs repository.

I hope this helps! If you have any other questions, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a ๐Ÿ‘ if this is helpful and ๐Ÿ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
morenosoldev commented 8 months ago

Hey i can't seem to return the docs meta data, is this not possible in the new versions or do i seem to be missing something? Any help would be appreciated, thanks.

dosubot[bot] commented 5 months ago

Hi, @andreasjoensson,

I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, you were inquiring about adding source documents to an agent when using a createRetrieverTool. There was a response providing guidance on modifying the getRelevantDocuments method to include source documents in the returned DocumentInterface objects, and you followed up with a code snippet and a question about returning the docs metadata.

Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, kindly let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation.