Closed tkcoding closed 1 month ago
Trying to use pymupdf4llm to extract embedded hyperlink in text , most of the embedded link doesn't get extracted to markdown mode. Is there any method that I can extract the text with embedded hyperlink together?
Example file : example_document.pdf
import pymupdf4llm md_text = pymupdf4llm.to_markdown("example_document.pdf")
Screenshot on expected outcome and current markdown produced:
library version : PyMuPDFb-1.24.10 pymupdf-1.24.10 pymupdf4llm-0.0.17
1.24.10
MacOS
3.10
Links inside table cells are not supported yet.
Description of the bug
Trying to use pymupdf4llm to extract embedded hyperlink in text , most of the embedded link doesn't get extracted to markdown mode. Is there any method that I can extract the text with embedded hyperlink together?
How to reproduce the bug
Example file : example_document.pdf
Screenshot on expected outcome and current markdown produced:
library version : PyMuPDFb-1.24.10 pymupdf-1.24.10 pymupdf4llm-0.0.17
PyMuPDF version
1.24.10
Operating system
MacOS
Python version
3.10