pymupdf / RAG

RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF
https://pymupdf.readthedocs.io/en/latest/pymupdf4llm
GNU Affero General Public License v3.0
302 stars 57 forks source link

Markup Links are associated to the whole text line instead of the original span #97

Open DiazBejaranoD opened 2 months ago

DiazBejaranoD commented 2 months ago

When extracting span links, the whole line gets assigned to the URL.

Example input:

image test_doc_pdf.pdf

Example output:

# 1. This is a header

[This is a sample file pointing to GitHub website](https://github.com/)

This is a paragraph

-----