melisgl / mgl-pax

Documentation system, browser, generator.
https://melisgl.github.io/mgl-pax-world/
MIT License
75 stars 13 forks source link

Support for PDF output #26

Open paulapatience opened 1 year ago

paulapatience commented 1 year ago

I have managed to generate quite acceptable PDF output from MGL-PAX documentation by feeding its Markdown output to Pandoc with a Lua filter that I wrote. In particular, the filter rewrites the links generated by MGL-PAX into appropriate LaTeX labels and phantomsections, so that clicking on them within the PDF jumps to the right place.

If PDF support is a desired feature for MGL-PAX, I can think of three ways to proceed:

  1. Add the :PDF format to PAX:DOCUMENT and accept that the only practical way to get PDF output is by relying on Pandoc.
  2. Add the :PANDOC-PDF or :PANDOC formats to PAX:DOCUMENT, where the specific output format in the latter case is specified elsewhere.
  3. Add a contrib/ directory where we add the Lua filter that I wrote so that users wishing to generate PDF output with Pandoc can benefit from it.

What are your thoughts?

melisgl commented 1 year ago

Nice! I haven't even considered adding PDF support at all so far. I'm not sure I understand option 1 and 2. So bear with me ...

One of the two options is probably that pax:document with some :pdf-like :format would generate a pdf from markdown by invoking Pandoc behind the scenes, but is this option 1 or 2? And what is the other option then? What is "specific output format"?

Also, the link format can be change within PAX based on the format if needed.

paulapatience commented 1 year ago

One of the two options is probably that pax:document with some :pdf-like :format would generate a pdf from markdown by invoking Pandoc behind the scenes, but is this option 1 or 2? And what is the other option then? What is "specific output format"?

Both option 1 and 2 would use Pandoc behind the scenes, but option 2 would be more forwards-compatible if PDF output were ever implemented in some other way, with some other PDF generator. The only difference between options 1 and 2 is the name of the :FORMAT.

As for specific output formats, Pandoc can generate much more than just PDF, in particular epub, so that would be something to keep in mind if we were to add a :PANDOC format (rather than just :PANDOC-PDF).

Also, the link format can be change within PAX based on the format if needed.

Indeed, I have been using v1 links, which are directly usable in LaTeX's \label{} (minus the leading #), but the issue is that MGL-PAX puts section anchors before the section title, whereas in LaTeX they need to go right after, and also it remains necessary to convert the anchors into appropriate \label{} and \phantomsection{} commands, otherwise Pandoc tries to link to the PDF file itself (if I remember correctly).

melisgl commented 1 year ago

Thanks for the clarification. I think :pandoc-pdf makes the most sense. Does pandoc provide options that need to be exposed through PAX? Do you perhaps have a sample pdf?

As to where anchors go, it's a simple change in print-section-title and maybe in documenting-reference if needed. It must be conditional though on (eq *format* :pandoc-pdf) because putting the anchor after the title puts the title just before the visible area when its link is clicked in HTML. But if you need to run a filter anyway to do the other things, moving the anchor might have a better place there.

paulapatience commented 1 year ago

I think it would be good to expose *PANDOC-OPTIONS* and *PANDOC-METADATA-BLOCK*, where the former would be appended or prepended to the default options which depend on the format we're using; for PDF it would be -f markdown -t pdf -o -. And this leaves possible the eventual addition of other Pandoc-based formats.

*PANDOC-OPTIONS* would be a list of arguments passed through to UIOP:RUN-PROGRAM. Pandoc's metadata block is YAML, but I think representing *PANDOC-METADATA-BLOCK* as an alist rather than a string would be better, something like the following:

(defvar *pandoc-metadata-block*
  '(("title" "MGL-PAX Manual") ; or ("title" . "MGL-PAX Manual")
    ("author" "Gábor Melis")
    ;; ...
    ))

This would make it easier for users to replace certain parts of a block than if it were one string. By default, though, the variable would be NIL.

If conditionalizing the output is as simple as (eq *format* :pandoc-pdf), then it might be simpler to adjust the Markdown from within MGL-PAX. Since you seem to be amenable to this feature, I can cook up a draft PR so you can see what it would look like. I could generate a PDF of the MGL-PAX documentation.

Callers of PAX:DOCUMENT will have to ensure that the :STREAM argument is an octet stream, or maybe the function can check that itself.

melisgl commented 1 year ago

This all sounds very reasonable to me. And yes, I think this would be a great feature.