ropensci / pdftools

Text Extraction, Rendering and Converting of PDF Documents
https://docs.ropensci.org/pdftools
Other
524 stars 71 forks source link

pdf_data(pdf, font_info = TRUE) crashes R 4.4.1 on Windows #132

Open ontogenerator opened 1 month ago

ontogenerator commented 1 month ago

Dear pdftools developers,

Frist of all, thank you for the really useful package! PDFs of scientific publications are a nightmare to parse, but your tool makes it much more bearable.

I would like to report the following issue:

When running pdf_data() with font_info = TRUE and the attached pdf files, the R session crashes without a specific error message.

10.1016+j.jneuroim.2023.578175.pdf s12916-022-02462-6_supplemental2.pdf

# these work
pdf_1 <- pdftools::pdf_data(pdf = "10.1016+j.jneuroim.2023.578175.pdf")
pdf_2 <- pdftools::pdf_data(pdf = "s12916-022-02462-6_supplemental2.pdf")
# but these crash the session
pdf_3 <- pdftools::pdf_data(pdf = "10.1016+j.jneuroim.2023.578175.pdf", font_info = TRUE)
pdf_4 <- pdftools::pdf_data(pdf = "s12916-022-02462-6_supplemental2.pdf", font_info = TRUE)

Thank you in advance for your assistance.