Open Hritesh-Hexaware opened 1 month ago
To open a PDF, jump to a specific page, and highlight the source node, you need to include the page number and highlight text metadata in the source_link
. Here is an example of how you can format the metadata:
for node in response.source_nodes:
print("-----")
text_fmt = node.node.get_content().strip().replace("\n", " ")[:1000]
print(f"Text:\t {text_fmt} ...")
print(f"Metadata:\t {node.node.metadata}")
print(f"Score:\t {node.score:.3f}")
source_link = f"file_path#page={node.node.metadata['page_number']}&highlight={node.node.metadata['highlight_text']}"
print(f"Source Link:\t {source_link}")
In this example:
file_path
is the path to your PDF file.page_number
is the page number you want to jump to.highlight_text
is the text you want to highlight.The metadata
dictionary should contain these keys to generate the appropriate source_link
[1].
To continue talking to Dosu, mention @dosu.
the response.source_nodes
contains the nodes used to make a response
There, you can access the metadata (which might have the page number). To highlight in a PDF, I would use some fuzzy matching library to compare against the original PDF text, like fuse.js
Question Validation
Question
I am parsing a pdf file and extracting information from it, i want to have a source_link, on click of which the pdf opens up , jumps to specific page no. and source node highlighted