jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.69k stars 3.39k forks source link

Allow specifying Inkspace for svg conversion tool (as well as rsvg-convert) #8176

Closed fuhrmanator closed 11 months ago

fuhrmanator commented 2 years ago

Describe your proposed improvement and the problem it solves.

Some SVG files (those with certain foreignObject?) get (better) conversion to PDF (or other destinations) with Inkscape rather than rsvg-convert. It would be good to have a way to specify which SVG converter to use.

Describe alternatives you've considered.

I've considered various filters (gists, python, hs), but they seem to be unmaintained by a community. ConTeXt apparently supports inkscape (if it's installed), but for my project I can't use ConTeXt.

jgm commented 2 years ago

What would be the command for converting on the command line via inkscape?

fuhrmanator commented 2 years ago

I'm not sure I understand the question, but I could suggest for pandoc:

--svg-converter=CONVERTER

CONVERTER is either rsvg-convert or inkscape

Or are you asking about how to convert with inkscape:

inkscape --export-filename=sample.pdf sample.svg

I believe the extension of the export filename determines the destination format (pdf, png, etc.)

There is documtation here.

jgm commented 2 years ago

Is inkscape strictly better than rsvg-convert? Or is it better in some cases and worse in others? If it's strictly better, perhaps we could always use inkscape if available and fall back on rsvg-convert. Then we wouldn't need a new option.

fuhrmanator commented 2 years ago

Is inkscape strictly better than rsvg-convert? Or is it better in some cases and worse in others?

I think it's the latter. I know that Inkscape has been involved in SVG as a standard for a while, e.g., https://inkscape.org/support-us/svg-standards-work/

Inkscape as a creation tool is very powerful and the things you can create with it are converted well to PDF (maintaining vectors -- there are examples I know of, especially with paths, that rsvg-convert rasterizes in the PDF).

But, the tool itself is likely much bigger (it's a pretty big install on windows, 0.5 G for example). image

I suspect the performance of both will also be different. Given a large batch of SVG files, rsvg-convert might be faster overall.

If it's strictly better, perhaps we could always use inkscape if available and fall back on rsvg-convert.

I like that idea, but I'm biased in wanting to use Inkscape. It's easy enough to remove inkscape from one's path to avoid pandoc from using it, I suppose.

tarleb commented 2 years ago

For the necessary commands see the respective function in the diagram-generator.lua filter. As can be seen there, the whole process is made more complex by having to check for the installed inkscape version, as the parameter names are not the same in v1 and v2.

matclab commented 11 months ago

If you arrived here because of some SVG produce by mermaid, you can get rid of the foreignObjects by adding %%{init: {"flowchart": { "htmlLabels": false}} }%% before the flowchart or graph line to solve the problem

alerque commented 11 months ago

Regarding the original suggestion, inkscape continues to be problematic for scripted CLI usage. An unpredictable set of actions and/or input/output formats trigger the GUI to start and close even when running CLI actions, the options regarding bounding boxes (page, bleed, crop boxes, etc.) are not tested from the CLI before releases and regularly break, etc. I would suggest it is not well suited to baking into pandoc. Perhas allowing a custom converter command to be specified at runtime would serve the purpose for specific scenarios, but it is a relatively unstable target to try to bake in support for (speaking as someone who maintains baked in support for scripted use in casile).

Foadsf commented 3 weeks ago

to highjack this discussion as well, the @matclab solution did not work for me, as explained here. Alternatively one can use mermaid-cli to export the Mermaid stuff as PDF:

mmdc -i path\toinput.md --outputFormat=pdf --pdfFit -o path\to\input_preprocessed.md

and then use Pandoc to convert it to PDF:

pandoc input_preprocessed.md -f markdown-implicit_figures -o path\to\output.pdf

alternatively, if you prefer using mermaid-filter, then change your fences code from

```mermaid
```

to

```{.mermaid format=pdf}
```

and then

pandoc -F "%APPDATA%\npm\mermaid-filter.cmd" input.md -o output.pdf

please consider that my solution is for Windows OS. On POSIX compatible platforms such as macOS and Linux you need to adapt.

Foadsf commented 3 weeks ago

@fuhrmanator I think an inherently better and more robust solution is to use headless Chromium based internet browsers, such as Google Chrome or Microsoft Edge, in headless mode to convert SVG to PDF. For example see this post. For example

msedge --headless --disable-gpu --print-to-pdf=<output-pdf-path> <input-svg-path>
fuhrmanator commented 3 weeks ago

Thanks @Foadsf ! That is super useful, and I've used headless Chrome in software testing pipelines.

Now, how to make it work in the pandoc pipeline? I'm using tools like Quarto and it defaults to rsvg-convert which fails on many of my SVG files, the impetus for this issue.

Foadsf commented 3 weeks ago

I'm totally a fan of your proposal to let the user choose for their SVG converter, but yeah rsvg-convert, CairoSVG, Ghostscript, Inkscape,... all seem to have issues with SVGs generated with Mermaid CLI. So I suppose the starting point should be search Pandoc's code base and find the exact line(s) where it attempts to call rsvg-convert and then replace it with headless browsers. Try and error!

jgm commented 3 weeks ago

I don't want to reopen this issue, because I don't think inkscape is an improvement over rsvg-convert. But headless chrome might be. Feel free to open a new issue, including the command-line invocations that would work for both chrome and edge. (I guess we could always see what is in the path - edge, chrome, chromium, rsvg-convert - and use the best available option.)

fuhrmanator commented 3 weeks ago

I was ready to open a new issue, and tried some conversions to verify the command-line info. But I got stuck because everything comes out in the PDF as US Letter size (I was able to turn off header/footer with --no-pdf-header-footer), which is definitely not good for my use cases. I verified in my SVG that my document size was constrained to the image (there's an <svg width="500px" height="500px" viewBox="0 0 500 500" ...> tag), but the PDF produced by msedge was still 8.5" x 11".

It seems edge/chrome/chromium doesn't have a command-line option to use the SVG's bounding box for the PDF page size. There are some work-arounds (see https://stackoverflow.com/questions/44970113/how-can-i-change-paper-size-in-headless-chrome-print-to-pdf/) if you put the SVG in an HTML document with a CSS setting of the bounding box, but that's seems overly complicated compared to how the other tools work. ChatGPT suggested creating a script that uses Puppeteer (nodejs), which also doesn't seem viable for a pandoc pipeline.

@Foadsf did I miss something? Otherwise, without variable-sized PDF output, I can't see this as a good option.

Foadsf commented 3 weeks ago

@fuhrmanator I am no expert here, but I would say manually hardcoding the page/frame size, is not a the best idea. According to this comment, Inkscape has the --export-area-drawing option to take care of that, and mmdc also has the --pdfFit which fits better to my use-case.

TBH, I have not tested the headless browser idea much other than this post, but I think if we want to resolve these issues in a canonical way, we need to go upstream and open issues for rsvg-convert/cairosvg... another option is also to use Puppeteer which AFIK is used also by mmdc and the output PDFs seem fine.

P.S.1. I see that somebody has already opened an issue here upstream.

P.S.2. Boy oh boy, the state of SVG rendering and conversion is really messy. The more I read about it the more I get convinced that SVG was probably a bad choice by the Mermaid team, and other projects such as Inkscape, as the intermediary markup language. Probably they would be better off using something like Postscript .eps. But if SVG is here to stay then maybe one could look into html2image, html2pdf, and/or html2canvas.

P.S.3. Apparently, Mozilla has solved the foreignObject support issue in their Gecko engine, but it seems like they have never published it as a standalone, modular and reusable library. More on that here. (yeah it seems like Mozilla has pivoted to wards Direct2D since then)

Foadsf commented 3 weeks ago

@jgm please check this simple Node.js script. feel free to just include it anywhere you need. More on this here.

jgm commented 3 weeks ago

@Foadsf this worked pretty well in my tests, but it produced a two-page PDF for a one-page SVG.

About the dependencies: I know nothing about 'puppeteer' - is this dependency something that would need to be installed?

Foadsf commented 3 weeks ago

@jgm I know nothing about Puppeteer either, but this is the same library that Mermaid CLI uses to generate the PDFs, and if I'm not mistaken this is basically equivalent to the proposal shared above about using the headless browser. I'm not sure if there is anything similar to Puppeteer for Haskell, but if someone has Mermaid CLI, they already have Node.js and Puppeteer. Feel free to take the code I shared in the Gist with a WTFPL license and alter it in anyway you like. I think eventually something like this could replace rsvg-convert.

In parallel I believe it should be possible to flatten the and simplify the SVGs that include foreignObject in a way that librsvg, Cairo, and Inkscape can handle them. I have had limited success with Cairo so far, but not the other two. I might share something here if I have any success. If this works then practically Mermaid visualization should be compiled with almost all other PDF engines Pandoc knows.

tarleb commented 3 weeks ago

A past Haskell/Google Summer of Code project touched on this, cdp-hs. It should be possible to use that instead of puppeteer.