ropensci / pdftools

Text Extraction, Rendering and Converting of PDF Documents
https://docs.ropensci.org/pdftools
Other
513 stars 69 forks source link

pdfconvert crashes Rsession #106

Closed foton263 closed 2 years ago

foton263 commented 2 years ago

Regardless of the pdf I use, pdf file size or filename, or the parameters of resolution I use the pdfconvert function fails constantly. Here is an example: pngfile <- pdftools::pdf_convert('https://jeroen.github.io/images/ocrscan.pdf', dpi = 600) pdfconvert fais

pngfile <- pdftools::pdf_convert('https://jeroen.github.io/images/ocrscan.pdf',format='tiff', dpi = 600) Converting page 1 to ocrscan_1.tiff...Error in poppler_convert(loadfile(pdf), format, pages, filenames, dpi, : Failed to save fileocrscan_1.tiff

pdftools::poppler_config() $version [1] "0.73.0"

$can_render [1] TRUE

$has_pdf_data [1] TRUE

$has_local_font_info [1] FALSE

$supported_image_formats [1] "png" "jpeg" "jpg" "tiff" "pnm"

UPDATE when I run the same from R console as Admin everything goes fine. The problem seems to relates to Rstudio.

sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)

Matrix products: default

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] nlme_3.1-144 re2_0.1.1 epubr_0.6.1 tools_3.6.3
[5] backports_1.1.5 utf8_1.2.1 R6_2.5.1 DBI_1.1.0
[9] lazyeval_0.2.2 colorspace_1.4-1 openNLPdata_1.5.3-4 tidyselect_1.1.1
[13] gridExtra_2.3 leaflet_2.0.3 compiler_3.6.3 lgr_0.4.2
[17] xml2_1.3.2 NLP_0.2-0 rsparse_0.4.0 ggdendro_0.1-20
[21] slam_0.1-47 mosaicCore_0.6.0 scales_1.1.1 tm_0.7-7
[25] readr_1.3.1 askpass_1.1 rappdirs_0.3.3 stringr_1.4.0
[29] digest_0.6.27 ggformula_0.9.4 rmarkdown_2.11 RhpcBLASctl_0.20-137 [33] jpeg_0.1-8 pkgconfig_2.0.3 htmltools_0.5.1.1 parallelly_1.25.0
[37] htmlwidgets_1.5.3 rlang_0.4.11 rstudioapi_0.13 shiny_1.3.2
[41] farver_2.0.3 generics_0.1.0 jsonlite_1.7.2 openNLP_0.2-7
[45] crosstalk_1.0.0 dplyr_1.0.6 magrittr_2.0.1 text2vec_0.6
[49] mosaicData_0.18.0 Matrix_1.3-4 Rcpp_1.0.6 munsell_0.5.0
[53] fansi_0.4.2 reticulate_1.20 lifecycle_1.0.1 stringi_1.6.2
[57] yaml_2.2.1 MASS_7.3-51.5 ggstance_0.3.4 grid_3.6.3
[61] parallel_3.6.3 listenv_0.8.0 promises_1.1.0 ggrepel_0.8.2
[65] mlapi_0.1.0 crayon_1.4.2 lattice_0.20-38 splines_3.6.3
[69] hms_1.1.0 knitr_1.30 pillar_1.6.4 codetools_0.2-16
[73] glue_1.4.2 tesseract_5.0.0 evaluate_0.14 pdftools_3.0.1
[77] qpdf_1.1 data.table_1.14.0 float_0.2-4 png_0.1-7
[81] vctrs_0.3.8 tweenr_1.0.1 httpuv_1.5.2.9000 gtable_0.3.0
[85] purrr_0.3.4 polyclip_1.10-0 tidyr_1.0.0 future_1.21.0
[89] ggplot2_3.3.5 xfun_0.29 ggforce_0.3.2 mime_0.9
[93] xtable_1.8-4 broom_0.5.5 later_1.0.0 novels_1.0.3
[97] mosaicCalc_0.5.1 tibble_3.1.1 rJava_0.9-14 globals_0.14.0
[101] mosaic_1.7.0 ellipsis_0.3.2

Napping-Lunar commented 2 years ago

Just tried it myself and it crashes with the same error that I reported in https://github.com/ropensci/pdftools/issues/105.

jeroen commented 2 years ago

This is a bug in Fedora. Moving discussion to: https://github.com/ropensci/pdftools/issues/105