mgieseki / dvisvgm

A fast DVI, EPS, and PDF to SVG converter
GNU General Public License v3.0
294 stars 28 forks source link

dvisvgm possibly not using mutool on the Mac #232

Closed bgvoisin closed 1 year ago

bgvoisin commented 1 year ago

In preparation for the upcoming release of Ghostscript 10.01.0 (which dvisvgm won't be able to use for PDF conversion, switching to mutool instead), I've been experimenting with dvisvgm and mutool on the Mac. Verdict: it looks like dvisvgm can't use or see mutool, but unfortunately I'm no dvisvgm expert and don't know how to diagnose what's happening.

What I tried: compile libgs 10.01.0 retrieved from GitHub, mutool 1.21.1 from its release source, and use dvisvgm 3.0.3 from tl2023 pretest. To be sure, I also checked with mutool (1.20.0 this time) installed via MacPorts, the result is the same.

When both libgs 10.01.0 and mutool 1.21.1 are installed, any conversion attempt from pdf to svg gives say (escher.pdf the example file from Ghostscript converted to PDF)

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
processing PDF file
  graphic size: 0pt x 0pt (0mm x 0mm)
  output written to escher-pdf.svg
1 of 1 page converted in 0.0569179 seconds

This looks like what's been reported when dvisvgm tries to use the C-based PDF interpreter of libgs. This isn't about a specific PDF file, I tried with several PDF files generated with different softwares, the output is the same.

Removing libgs completely from my system, so only mutool is present, gives instead

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
ERROR: can't retrieve number of pages from file escher.pdf

So I'm starting to wonder whether mutool is actually seen or used by dvisvgm. mutool is found by which, and mutool convert -o escher-mutool.svg escher.pdf works fine. Setting DVISVGM_PDF_PROC=mutool doesn't change the dvisvgm output.

I'm not sure how to test things more precisely. In a Terminal window just opened, hence with the default environment:

% dvisvgm -V1
dvisvgm 3.0.3 (aarch64-apple-darwin20.6.0)
brotli:   1.0.9
clipper:  6.2.1
freetype: 2.13.0
kpathsea: 6.3.5
potrace:  1.16
xxhash:   0.8.1
zlib:     1.2.13

% dvisvgm -l 
bgcolor    background color special
color      complete support of color specials
dvisvgm    special set for embedding raw SVG snippets
em         line drawing statements of the emTeX special set
html       hyperref specials
papersize  special to set the page size
pdf        PDF hyperlink, font map, and pagesize specials
tpic       TPIC specials

% which mutool

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
ERROR: can't retrieve number of pages from file escher.pdf

% export DVISVGM_PDF_PROC=mutool

% dvisvgm -V1                                               
dvisvgm 3.0.3 (aarch64-apple-darwin20.6.0)
brotli:   1.0.9
clipper:  6.2.1
freetype: 2.13.0
kpathsea: 6.3.5
mutool:   1.21.1
potrace:  1.16
xxhash:   0.8.1
zlib:     1.2.13

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
ERROR: can't retrieve number of pages from file escher.pdf

It looks like, in the default environment, dvisvgm -l reports the ability to process PDF files but dvisvgm -V1 doesn't mention mutool. After setting DVISVGM_PDF_PROC, dvisvgm -V1 acknowledges mutool. This doesn't seem to affect the ability of dvisvgm to convert PDF.

Looking at the code, can't retrieve number of pages from file seems triggered in PDFToSVG.cpp and to come after a call of psInterpreter hence PSInterpreter.cpp, while the number of pages here, if I got things right, is meant to come from PDFHandler.cpp through mutool show <filename>.pdf trailer/Root/Pages/Count. I've no idea how to interpret this. In any case,

% mutool show escher.pdf trailer/Root/Pages/Count                       
mgieseki commented 1 year ago

At the moment, dvisvgm still retrieves the number of PDF pages through Ghostscript, so that the mentioned error message is expected if you remove GS from your system. According to the GS documentation, the involved PS operators, like runpdfbegin, are still available in GS > 10.0. The conversion shouldn't fail at this stage.

I can't tell why mutool is not listed in your first call of dvisvgm -V1. If it's present in a directory covered by PATH, dvisvgm should find it regardless of whether DVISVGM_PDF_PROC is set or not -- except there are some interfering security measures of MacOS. I can't reproduce the issue on my Linux machine and don't have a Mac available. Therefore, it's difficult to debug it.

bgvoisin commented 1 year ago

Thanks for your answer. Based on it, I went back to a setup with both gs 10.01.0 and mutool 1.21.1. I played also with the compilation options of mupdf, to finally get mupdf-x11 to compile, using

sudo make HAVE_X11=yes HAVE_GLUT=no X11_CFLAGS="-I/usr/X11/include" X11_LIBS="-L/usr/X11/lib -lX11 -lXext" build=release shared=yes install

but I think that's immaterial here. (Previously it was just with mutool compiled with HAVEX11=no and HAVE_GLUT=no.)

I can't make sense of what I get. There does not seem to be full reproducibility.

Here is the exact output I get in a new terminal window, in the order I typed the commands, with no interim commands omitted, just adding blank lines for clarity:

% mutool -v
mutool version 1.21.1

% which mutool

% echo $PATH

% dvisvgm -V1
dvisvgm 3.0.3 (aarch64-apple-darwin20.6.0)
brotli:      1.0.9
clipper:     6.2.1
freetype:    2.13.0
Ghostscript: 10.1.0
kpathsea:    6.3.5
mutool:      1.21.1
potrace:     1.16
xxhash:      0.8.1
zlib:        1.2.13

% cd ~/Desktop/Test/dvisvgm 

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
ERROR: To process PDF files, either Ghostscript < 10.1 or mutool is required.
The installed Ghostscript version 10.1.0 isn't supported.

% export DVISVGM_PDF_PROC=mutool

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
processing PDF file
  graphic size: 0pt x 0pt (0mm x 0mm)
  output written to escher-pdf.svg
1 of 1 page converted in 0.0869172 seconds

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
processing PDF file
  graphic size: 614.295pt x 794.97pt (215.9mm x 279.4mm)
  output written to escher-pdf.svg
1 of 1 page converted in 0.186326 seconds

Running env in a new Terminal window before launching dvisvgm, I can avoid the first error ("To process PDF files [...]") and I don't have to set DVISVGM_PDF_PROC. Then, either I get the two above results (first one giving graphic size 0pt x 0pt, second one correct size), or I get the correct result on first try.

To be sure: there's nothing special in my ENV, I put nothing there, it's the default macOS setup, with zsh the shell and my .zprofile empty.

Finally I've done one last try, removing the .svg file created by the previous rune of dvisvgm (ie escher.svg), in the same Terminal window where dvisvgm had performed the PDF to SVG conversion successfully before. Here is exactly what I typed afterwards, four times the same command in succession, doing absolutely nothing else in between, and the anwsers I got:

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
ERROR: To process PDF files, either Ghostscript < 10.1 or mutool is required.
The installed Ghostscript version 10.1.0 isn't supported.

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
processing PDF file
  graphic size: 0pt x 0pt (0mm x 0mm)
  output written to escher-pdf.svg
1 of 1 page converted in 0.0858209 seconds

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
processing PDF file
  graphic size: 0pt x 0pt (0mm x 0mm)
  output written to escher-pdf.svg
1 of 1 page converted in 0.079215 seconds

% dvisvgm --verbosity=7 --pdf escher.pdf --output=%f-pdf.svg
processing PDF file
  graphic size: 614.295pt x 794.97pt (215.9mm x 279.4mm)
  output written to escher-pdf.svg
1 of 1 page converted in 0.199508 seconds
mgieseki commented 1 year ago

dvisvgm executes mutool by creating a subprocess and the main process should wait until mutool has finished. All output written to stdout by the subprocess, like the mutool version number, is captured and saved to a string. It seems this data is not always present in time when dvisvgm continues after calling mutool. Could the erratic behavior be caused by an unexpected asynchronous execution or delayed writing/flushing of the data?

bgvoisin commented 1 year ago

Is there anything I could do to help at this stage, tests I could run on the Mac? (Knowing that I probably won't be able to do this before a few days.)

Unfortunately this goes way beyond my wit, as I'm not a programmer. The furthest I went is see the message ERROR: To process PDF file [...] comes from PDFToSVG.cpp, based PDFHandler::available which is set in PDFHandler.hpp depending on PDFHandler::mutoolVersion which is determined in PDFHandler.cpp by running mutool -v.

mgieseki commented 1 year ago

I guess, I found a possible bug that might be the cause of the erratic behavior when requesting the mutool version number. Could you please apply this patch to your dvisvgm sources, build a new executable and test it again? Instead of applying the patch you can also download the entire file Process.cpp and copy it over your local one. In TeX Live it's located in folder texk/dvisvgm/dvisvgm-src/src/.

bgvoisin commented 1 year ago

Thanks for the extra-fast response. This does seem to solve the matter, dvisvgm now works with all the PDF files I tried, single-page or multi-page. I'll need to confirm later as this was done using TeX Live sources from February 18, containing dvisvgm 3.0.2. I can't rsync from my university office to get the latest sources (rsync blocked), I'll try again later from home.

bgvoisin commented 1 year ago

The patch works perfectly with dvisvgm 3.0.3 from the latest TeX Live sources.

mgieseki commented 1 year ago

That's relieving. Thanks for the fast confirmation.