ropensci / magick

Magic, madness, heaven, sin
https://docs.ropensci.org/magick
Other
458 stars 64 forks source link

image_read fails to execute `gs` on Windows for reading PDF #398

Closed zeileis closed 2 months ago

zeileis commented 4 months ago

Jeroen, one of our student assistants is trying to use magick to convert scanned exam sheets from PDF to PNG via magick::image_read(). But he runs into an error that gs could not be found/executed. He is using Windows 10 and we got the same problem with R 4.4.0 and 4.3.2 using magick 2.8.3. (The same code runs successfully on Debian and MacOS using different R versions.)

A small reproducible example is included below. Is this a problem with his installation or with the magick binary or something else? Any help or insights would be appreciated. Thanks in advance for your help and all your work on the package in general!

## generate simple pdf
pdf("rplot.pdf")
plot(1:10)
dev.off()

## try to read with image_read
img <- magick::image_read("rplot.pdf")
## Error: Rgui.exe: FailedToExecuteCommand `"gs" -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dPrinted=false  "-sOutputFile=C:/Users/feder/AppData/Local/Temp/RtmpGYIgLW/magick-bZBQN5b6vQBPXtRsW1e1dUESLlO3NGSb%d" "-fC:/Users/feder/AppData/Local/Temp/RtmpGYIgLW/magick-jjtK02NBzxLD_sdV10iMG5RLp80xBfgz" "-fC:/Users/feder/AppData/Local/Temp/RtmpGYIgLW/magick-Tlyc2jc4WIBFbJLbK5WapE2wungryaJK"' (The system cannot find the file specified.
## ) @ error/delegate.c/ExternalDelegateCommand/513

## R and magick versions
R.version.string
## [1] "R version 4.4.0 (2024-04-24 ucrt)"
packageVersion("magick")
## [1] ‘2.8.3’

sessionInfo()
## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
## 
## Matrix products: default
## 
## 
## locale:
##   [1] LC_COLLATE=English_United Kingdom.utf8 
## [2] LC_CTYPE=English_United Kingdom.utf8   
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.utf8    
## 
## time zone: Europe/Luxembourg
## tzcode source: internal
## 
## attached base packages:
##   [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##   [1] compiler_4.4.0 magrittr_2.0.3 magick_2.8.3   tools_4.4.0    Rcpp_1.0.12   
jeroen commented 4 months ago

Can you try with image_read_pdf() ? This uses the pdftools package to parse the pdf.

Op wo 29 mei 2024 12:28 schreef Achim Zeileis @.***>:

Jeroen, one of our student assistants is trying to use magick to convert scanned exam sheets from PDF to PNG via magick::image_read(). But he runs into an error that gs could not be found/executed. He is using Windows 10 and we got the same problem with R 4.4.0 and 4.3.2 using magick 2.8.3. (The same code runs successfully on Debian and MacOS using different R versions.)

A small reproducible example is included below. Is this a problem with his installation or with the magick binary or something else? Any help or insights would be appreciated. Thanks in advance for your help and all your work on the package in general!

generate simple pdf

pdf("rplot.pdf") plot(1:10) dev.off()

try to read with image_read

img <- magick::image_read("rplot.pdf")

Error: Rgui.exe: FailedToExecuteCommand `"gs" -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dPrinted=false "-sOutputFile=C:/Users/feder/AppData/Local/Temp/RtmpGYIgLW/magick-bZBQN5b6vQBPXtRsW1e1dUESLlO3NGSb%d" "-fC:/Users/feder/AppData/Local/Temp/RtmpGYIgLW/magick-jjtK02NBzxLD_sdV10iMG5RLp80xBfgz" "-fC:/Users/feder/AppData/Local/Temp/RtmpGYIgLW/magick-Tlyc2jc4WIBFbJLbK5WapE2wungryaJK"' (The system cannot find the file specified.

) @ error/delegate.c/ExternalDelegateCommand/513

R and magick versions

R.version.string

[1] "R version 4.4.0 (2024-04-24 ucrt)"

packageVersion("magick")

[1] ‘2.8.3’

sessionInfo()

R version 4.4.0 (2024-04-24 ucrt)

Platform: x86_64-w64-mingw32/x64

Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:

[1] LC_COLLATE=English_United Kingdom.utf8

[2] LC_CTYPE=English_United Kingdom.utf8

[3] LC_MONETARY=English_United Kingdom.utf8

[4] LC_NUMERIC=C

[5] LC_TIME=English_United Kingdom.utf8

time zone: Europe/Luxembourg

tzcode source: internal

attached base packages:

[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):

[1] compiler_4.4.0 magrittr_2.0.3 magick_2.8.3 tools_4.4.0 Rcpp_1.0.12

— Reply to this email directly, view it on GitHub https://github.com/ropensci/magick/issues/398, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUZ73THLTS3AC33T6HALTZEWUUHAVCNFSM6AAAAABIOUO3TKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDEOJWGIZDEOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

zeileis commented 4 months ago

The image_read_pdf() function via pdftools does work. However, it produces larger PNG files which is why I would be interested in using image_read() especially for large-scale exams.

jeroen commented 4 months ago

The image_read_pdf() function via pdftools does work. However, it produces larger PNG files which is why I would be interested in using image_read() especially for large-scale exams.

That is probably only due to the higher resolution, not the parser library. You can reduce the resolution to 72 to match the ghostscript default: image_read_pdf("input.pdf", density = 72)

zeileis commented 4 months ago

I'm using the same density = 300 in both versions. Using the same PDF file I compared:

"S0000001.pdf" |>
  image_read(density = 300) |>
  image_convert("png") |>
  image_write(path = "v1.png", format = "png")
"S0000001.pdf" |>
  image_read_pdf(density = 300) |>
  image_convert("png") |>
  image_write(path = "v2.png", format = "png")

Both PNG files claim to have 2480 x 3507 pixels but v1 has 60.5K and v2 has 263.5K.

file.size(Sys.glob("v*.png"))
## [1]  60521 263476

This makes a relevant difference when processing scanned exam sheets for a couple of hundred students. Or am I not using the density correctly here?

In any case it would be good if the gs in image_read() also worked. Is this a missing dependency or an upstream issue or something else?

jeroen commented 4 months ago

In any case it would be good if the gs in image_read() also worked. Is this a missing dependency or an upstream issue or something else?

It means ghostscript is not installed or not found on the PATH.

Both PNG files claim to have 2480 x 3507 pixels but v1 has 60.5K and v2 has 263.5K.

OK so there must be some other property affects the png encoding. Does image_info() show the same with, height and colorspace? If the content is black/white you can try passing colorspace='gray' to your `image_convert() call.

zeileis commented 4 months ago

Piping the converted image through image_info() rather than image_write() gives identical output:

"S0000001.pdf" |>
  image_read_pdf(density = 300) |>
  image_convert("png") |>
  image_info()
##   format width height colorspace matte filesize density
## 1    PNG  2480   3507       sRGB  TRUE        0 300x300

Re-reading the two PNG files yields a lower density for the bigger file:

image_read("v1.png") |> image_info()
##   format width height colorspace matte filesize density
## 1    PNG  2480   3507       Gray FALSE    60521 300x300
image_read("v2.png") |> image_info()
##   format width height colorspace matte filesize density
## 1    PNG  2480   3507       Gray FALSE   263476 118x118

Setting colorspace = "gray" in the image_convert() changes the colorspace prior to writing from sRGB to Gray. But the written PNGs do not seem to be different.

Regarding ghostscript: So I need to install that separately on the system, say from https://www.ghostscript.com/releases/gsdnld.html and make sure it is included on the PATH?

jeroen commented 4 months ago

Regarding ghostscript: So I need to install that separately on the system, say from https://www.ghostscript.com/releases/gsdnld.html and make sure it is included on the PATH?

Yes but you can also install it automatically via a package manger, e.g. brew on MacOS or choco on Windows.

zeileis commented 2 months ago

Apologies for the long delay, I didn't have proper follow-up from our student on Windows. But I just iterated through the process with another student on Mac OS via brew which worked smoothly.

What would be great is a more verbose message in image_read() that Ghostscript would need to be installed for reading PDFs or that image_read_pdf() should be used instead. Also the Ghostscript dependency for PDFs could be explained explicitly on the manual page. Currently, it only refers to it implicitly saying that ImageMagick delegates are used. Maybe something like:

For reading svg or pdf imagemagick delegates tasks to ... (svg) and ghostscript (pdf), respectively, i.e., these are expected to be installed on the system. As an alternative, it is recommended to use image_read_svg() and image_read_pdf() if the ...

(I'm not sure which delegates are used for svg.)

zeileis commented 2 months ago

Thanks, I've added a similar check in the exams package.

zeileis commented 2 months ago

Sorry, I have one follow-up question. On my Linux system the check via Sys.which("gs") seems to work as desired. But on (some?) Windows systems, Sys.which("gs") returns an empty string but image_read() still finds the Ghostscript installation, e.g., C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.exe. Thus the new check seems to be too restrictive for this?

jeroen commented 2 months ago

OK I'll check for this as well: https://github.com/ropensci/magick/commit/407dba3a866755e9b669fb27962daf6f091451c3

zeileis commented 2 months ago

But maybe this also does not cover all relevant paths?

Maybe it would be better to catch the error from imagemagick first and then add the informative note afterwards if the error has occurred?

jeroen commented 2 months ago

There are many reasons why reading an image can error, so a potential error might not be related to ghostscript per se. It is not so easy to make the error more informative. But I'll try to only show this warning in case the function actually errors.

zeileis commented 2 months ago

Thanks for all of this, very much appreciated!