Open mw19930312 opened 2 months ago
Hey there @mw19930312! Great to see you diving into the intricacies of the CitationQueryEngine again. 😄 I remember our last brainstorming session here; always a pleasure to assist a fellow knowledge seeker. I'm on it - gonna unravel the citation generation mystery for you in just a jiffy!
The nodes are labeled as source 1, source 2, etc.
and then the LLM is prompted to cite which sources it used
Would you mind referring me where the LLM is prompted? I just wonder how to pass the citation URLs if there are any. Also, if I understand it correctly, the nodes are formatted as the following through a metadata conversion template, right?
[source 1]
context:
@logan-markewich ^^
In the CitationQueryEngine
, the process of prompting the LLM for citations involves a few key steps:
Retrieving and Formatting Nodes: Initially, nodes are retrieved from a vector index and are then formatted as citation sources. Each node is assigned a label (e.g., source 1
, source 2
), which includes the context and any relevant metadata, such as URLs. This is achieved through the retrieval and formatting mechanisms within the engine, where metadata from the nodes is used to create a detailed context for the LLM.
Synthesizing the Response: The formatted nodes are passed to the LLM along with a citation prompt template. This template instructs the LLM on how to incorporate the citations into its response, guiding it to reference the sources by their assigned labels. The process ensures that the final response generated by the LLM includes citations to the provided sources, making it clear which parts of the response are supported by which sources.
Metadata and URLs: The inclusion of URLs and other metadata in the citation nodes is controlled by the metadata_mode
parameter. This means that when nodes are converted into citation sources, their metadata, including URLs, can be formatted and included based on the settings specified in the metadata_mode
. This allows for a flexible approach to how detailed the citations are, including the possibility of directly citing URLs within the response.
This approach ensures that the final response from the LLM is not only informative but also accurately cites the sources it used, providing a clear and traceable path back to the original information.
To continue the conversation, mention @dosu.
@mw19930312
Question Validation
Question
I understand that in a CitationQueryEngine, we retrieve a couple of reference nodes from a vector index. Then, these nodes are passed into a synthesizer to generate a final response.