Closed zeileis closed 2 months ago
Can you try with image_read_pdf() ? This uses the pdftools package to parse the pdf.
Op wo 29 mei 2024 12:28 schreef Achim Zeileis @.***>:
Jeroen, one of our student assistants is trying to use magick to convert scanned exam sheets from PDF to PNG via magick::image_read(). But he runs into an error that gs could not be found/executed. He is using Windows 10 and we got the same problem with R 4.4.0 and 4.3.2 using magick 2.8.3. (The same code runs successfully on Debian and MacOS using different R versions.)
A small reproducible example is included below. Is this a problem with his installation or with the magick binary or something else? Any help or insights would be appreciated. Thanks in advance for your help and all your work on the package in general!
generate simple pdf
pdf("rplot.pdf") plot(1:10) dev.off()
try to read with image_read
img <- magick::image_read("rplot.pdf")
Error: Rgui.exe: FailedToExecuteCommand `"gs" -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dPrinted=false "-sOutputFile=C:/Users/feder/AppData/Local/Temp/RtmpGYIgLW/magick-bZBQN5b6vQBPXtRsW1e1dUESLlO3NGSb%d" "-fC:/Users/feder/AppData/Local/Temp/RtmpGYIgLW/magick-jjtK02NBzxLD_sdV10iMG5RLp80xBfgz" "-fC:/Users/feder/AppData/Local/Temp/RtmpGYIgLW/magick-Tlyc2jc4WIBFbJLbK5WapE2wungryaJK"' (The system cannot find the file specified.
) @ error/delegate.c/ExternalDelegateCommand/513
R and magick versions
R.version.string
[1] "R version 4.4.0 (2024-04-24 ucrt)"
packageVersion("magick")
[1] ‘2.8.3’
sessionInfo()
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8
[2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8
time zone: Europe/Luxembourg
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.4.0 magrittr_2.0.3 magick_2.8.3 tools_4.4.0 Rcpp_1.0.12
— Reply to this email directly, view it on GitHub https://github.com/ropensci/magick/issues/398, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUZ73THLTS3AC33T6HALTZEWUUHAVCNFSM6AAAAABIOUO3TKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDEOJWGIZDEOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
The image_read_pdf()
function via pdftools
does work. However, it produces larger PNG files which is why I would be interested in using image_read()
especially for large-scale exams.
The
image_read_pdf()
function viapdftools
does work. However, it produces larger PNG files which is why I would be interested in usingimage_read()
especially for large-scale exams.
That is probably only due to the higher resolution, not the parser library. You can reduce the resolution to 72 to match the ghostscript default: image_read_pdf("input.pdf", density = 72)
I'm using the same density = 300
in both versions. Using the same PDF file I compared:
"S0000001.pdf" |>
image_read(density = 300) |>
image_convert("png") |>
image_write(path = "v1.png", format = "png")
"S0000001.pdf" |>
image_read_pdf(density = 300) |>
image_convert("png") |>
image_write(path = "v2.png", format = "png")
Both PNG files claim to have 2480 x 3507 pixels but v1 has 60.5K and v2 has 263.5K.
file.size(Sys.glob("v*.png"))
## [1] 60521 263476
This makes a relevant difference when processing scanned exam sheets for a couple of hundred students. Or am I not using the density
correctly here?
In any case it would be good if the gs
in image_read()
also worked. Is this a missing dependency or an upstream issue or something else?
In any case it would be good if the
gs
inimage_read()
also worked. Is this a missing dependency or an upstream issue or something else?
It means ghostscript is not installed or not found on the PATH.
Both PNG files claim to have 2480 x 3507 pixels but v1 has 60.5K and v2 has 263.5K.
OK so there must be some other property affects the png encoding. Does image_info()
show the same with, height and colorspace? If the content is black/white you can try passing colorspace='gray'
to your `image_convert() call.
Piping the converted image through image_info()
rather than image_write()
gives identical output:
"S0000001.pdf" |>
image_read_pdf(density = 300) |>
image_convert("png") |>
image_info()
## format width height colorspace matte filesize density
## 1 PNG 2480 3507 sRGB TRUE 0 300x300
Re-reading the two PNG files yields a lower density for the bigger file:
image_read("v1.png") |> image_info()
## format width height colorspace matte filesize density
## 1 PNG 2480 3507 Gray FALSE 60521 300x300
image_read("v2.png") |> image_info()
## format width height colorspace matte filesize density
## 1 PNG 2480 3507 Gray FALSE 263476 118x118
Setting colorspace = "gray"
in the image_convert()
changes the colorspace prior to writing from sRGB to Gray. But the written PNGs do not seem to be different.
Regarding ghostscript: So I need to install that separately on the system, say from https://www.ghostscript.com/releases/gsdnld.html and make sure it is included on the PATH?
Regarding ghostscript: So I need to install that separately on the system, say from https://www.ghostscript.com/releases/gsdnld.html and make sure it is included on the PATH?
Yes but you can also install it automatically via a package manger, e.g. brew on MacOS or choco on Windows.
Apologies for the long delay, I didn't have proper follow-up from our student on Windows. But I just iterated through the process with another student on Mac OS via brew which worked smoothly.
What would be great is a more verbose message in image_read()
that Ghostscript would need to be installed for reading PDFs or that image_read_pdf()
should be used instead. Also the Ghostscript dependency for PDFs could be explained explicitly on the manual page. Currently, it only refers to it implicitly saying that ImageMagick delegates are used. Maybe something like:
For reading svg or pdf imagemagick delegates tasks to ... (svg) and ghostscript (pdf), respectively, i.e., these are expected to be installed on the system. As an alternative, it is recommended to use image_read_svg() and image_read_pdf() if the ...
(I'm not sure which delegates are used for svg.)
Thanks, I've added a similar check in the exams
package.
Sorry, I have one follow-up question. On my Linux system the check via Sys.which("gs")
seems to work as desired. But on (some?) Windows systems, Sys.which("gs")
returns an empty string but image_read()
still finds the Ghostscript installation, e.g., C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.exe
. Thus the new check seems to be too restrictive for this?
OK I'll check for this as well: https://github.com/ropensci/magick/commit/407dba3a866755e9b669fb27962daf6f091451c3
But maybe this also does not cover all relevant paths?
Maybe it would be better to catch the error from imagemagick first and then add the informative note afterwards if the error has occurred?
There are many reasons why reading an image can error, so a potential error might not be related to ghostscript per se. It is not so easy to make the error more informative. But I'll try to only show this warning in case the function actually errors.
Thanks for all of this, very much appreciated!
Jeroen, one of our student assistants is trying to use
magick
to convert scanned exam sheets from PDF to PNG viamagick::image_read()
. But he runs into an error thatgs
could not be found/executed. He is using Windows 10 and we got the same problem with R 4.4.0 and 4.3.2 usingmagick
2.8.3. (The same code runs successfully on Debian and MacOS using different R versions.)A small reproducible example is included below. Is this a problem with his installation or with the
magick
binary or something else? Any help or insights would be appreciated. Thanks in advance for your help and all your work on the package in general!