Open nmakhotkin opened 1 month ago
Given this PDF, there is no text output until page 8 yet it can be exported well (without formatting etc.) using directly pymupdf.
68478448.pdf
import pymupdf4llm import pymupdf text = pymupdf4llm.to_markdown( '68478448.pdf', margins=(0, 15, 0, 15), write_images=True, show_progress=True, ) pymupdf_text = pymupdf.open('68478448.pdf').get_page_text(0) print('DÉNOMINATION DU MÉDICAMENT' in text) # False print('DÉNOMINATION DU MÉDICAMENT' in pymupdf_text) # True
text here starts with rare (≥ 1/10 000 à < 1/1 000) which is found only on 8th page in the file.
text
rare (≥ 1/10 000 à < 1/1 000)
UPD: It worked ok with version 0.0.16
+1
Given this PDF, there is no text output until page 8 yet it can be exported well (without formatting etc.) using directly pymupdf.
68478448.pdf
text
here starts withrare (≥ 1/10 000 à < 1/1 000)
which is found only on 8th page in the file.UPD: It worked ok with version 0.0.16