jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.82k stars 3.39k forks source link

Make options accessible within templates #10302

Open ZoomRmc opened 1 month ago

ZoomRmc commented 1 month ago

Bumped into an issue that I can't access the --top-level-division setting inside the typst template (necessary to implement page breaks when set to "chapter"). There's a workaround: the template can expose an arbitrary variable which can be set with the -V argument, but this potentially breaks the API contract (that this option is what determines the respective part of the layout).

The docs for --top-level-division state:

Treat top-level headings as the given division type in LaTeX, ConTeXt, DocBook, and TEI output.

This reads as the feature is implemented for the mentioned formats, but it's not clear if the setting is accessible from templates of those or any other formats.

  1. The docs should clearly mention what settings can or cannot be accessed from the template system.
  2. It would be beneficial to be able to read as many settings as possible, the template system is intentionally feature-limited, but it's still important for further interpreted output formats, such as typst or TeX.
jgm commented 1 month ago

Why should you expect to be able to determine inside the template which command line options were used?

ZoomRmc commented 1 month ago

Because a considerable amount of them is supposed to affect the layout of the document. If it's not how it's supposed to work, I'd appreciate any tips.

I understand, that the option in question is in the "Options affecting specific writers" section and typst is not mentioned as a writer affected by it, but the only way I see to add it, is for this option to be able to influence the output. This can be done either by spitting out a hardcoded typst chunk or affecting the template.

Again, I'm not yet familiar with pandoc's internal organisation enough to know if it's the correct approach.

Technically, reading an option from the template is not a hard requirement, the other way is to introduce a variable that gets set with an option.

jgm commented 1 month ago

Maybe it would help me if you explained a bit more why you need to know, inside the template, what the top-level division setting is.

In principle, we could change pandoc so that it populates a template variable, say pandoc-cli-options, with all of the options used on the command line and/or default files.

ZoomRmc commented 1 month ago

In my case, I'd like the --top-level-division to at least start to behave as intended with the typst writer. As typst currently doesn't have any built-in concepts of document class, I can simulate some of it with this option. In particular, I want to easily force page breaks at each top-level heading (as suitable for a book). For this I need to add the following typst chunk, which I can do in a few different ways. I think that using the most fitting existing option is the best approach and will allow me to make a change proposal to the shipped typst template so all this "just works".

#show heading.where(level: 1): it => [
  #pagebreak(weak: true)
  #it
]

Currently, this chunk could be used inside this template conditional, where top-level-division is a custom variable reflecting the existing pandoc option:

$if(top-level-division)$
#let top-level-division = "$top-level-division$"
#show heading.where(level: 1): it => [
  #if top-level-division == "chapter" [ 
    #pagebreak(weak: true)
  ]
  #it
]
$endif$
jgm commented 1 month ago

One low-tech change we could make would be to have the typst writer set a top-level-division variable. And then it might make sense to add something like the above to the default template. Probably you'd want if top-level-division != "section", because then you'd get a page break for "part" too.

ZoomRmc commented 1 month ago

Yeah, that's basically what I want. The issue is that I'm pretty sure someone (perhaps myself) will bump into some similar situation rather soon with some other option.

jgm commented 1 month ago

Here's a diff that will expose an opts variable automatically in every template:

diff --git a/src/Text/Pandoc/App/OutputSettings.hs b/src/Text/Pandoc/App/OutputSettings.hs
index 8fe3c203f..24ea302f8 100644
--- a/src/Text/Pandoc/App/OutputSettings.hs
+++ b/src/Text/Pandoc/App/OutputSettings.hs
@@ -21,7 +21,7 @@ module Text.Pandoc.App.OutputSettings
   ) where
 import qualified Data.Map as M
 import qualified Data.Text as T
-import Text.DocTemplates (toVal, Context(..), Val(..))
+import Text.DocTemplates (ToContext(..), Context(..), Val(..))
 import qualified Control.Exception as E
 import Control.Monad
 import Control.Monad.Except (throwError)
@@ -45,6 +45,7 @@ import Text.Pandoc.Highlighting (lookupHighlightingStyle)
 import Text.Pandoc.Scripting (ScriptingEngine (engineLoadCustom),
                               CustomComponents(..))
 import qualified Text.Pandoc.UTF8 as UTF8
+import Data.Aeson (toJSON)

 readUtf8File :: PandocMonad m => FilePath -> m T.Text
 readUtf8File fp = readFileStrict fp >>= toTextM fp
@@ -197,6 +198,8 @@ optToOutputSettings scriptingEngine opts = do
     >>=
     setVariableM "pandoc-version" pandocVersionText
     >>=
+    setVariableM "opts" (toJSON opts)
+    >>=
     setFilesVariableM "include-before" (optIncludeBeforeBody opts)
     >>=
     setFilesVariableM "include-after" (optIncludeAfterBody opts)
@@ -276,8 +279,8 @@ optToOutputSettings scriptingEngine opts = do
     }

 -- | Set text value in text context unless it is already set.
-setVariableM :: Monad m
-             => T.Text -> T.Text -> Context T.Text -> m (Context T.Text)
+setVariableM :: (Monad m, ToContext T.Text b)
+             => T.Text -> b -> Context T.Text -> m (Context T.Text)
 setVariableM key val (Context ctx) = return $ Context $ M.alter go key ctx
   where go Nothing             = Just $ toVal val
         go (Just x)            = Just x

Not completely sure this is a good idea, but it was easy. Note that in most formats it is not going to be possible to test the value of a variable, because the template language has no =. In typst you can get around it in the way you describe above, but it's a special case.

jgm commented 1 month ago

Here's the output from a template that just iterates over the key/value pairs in opts:

% echo "hi" | pandoc --template my.tpl -t html --top-level-division=chapter
ascii: false
bibliography: 
cite-method: citeproc
columns: 72
css: 
default-image-extension: 
dpi: 96
dump-args: false
email-obfuscation: none
embed-resources: false
eol: native
epub-fonts: 
epub-subdirectory: EPUB
epub-title-page: true
fail-if-warnings: false
figure-caption-position: below
file-scope: false
filters: 
highlight-style: pygments
html-math-method: plain
html-q-tags: false
identifier-prefix: 
ignore-args: false
include-after-body: 
include-before-body: 
include-in-header: 
incremental: false
indented-code-classes: 
ipynb-output: IpynbOutputBest
link-images: false
list-of-figures: false
list-of-tables: false
list-tables: false
listings: false
metadata: true
metadata-files: 
no-check-certificate: false
number-offset: 
number-sections: false
pdf-engine-opts: 
preserve-tabs: false
reference-links: false
reference-location: end-of-document
request-headers: 
resource-path: .
sandbox: false
section-divs: false
self-contained: false
setext-headers: false
shift-heading-level-by: 0
split-level: 1
standalone: true
strip-comments: false
syntax-definitions: 
tab-stop: 4
table-caption-position: above
table-of-contents: false
template: my.tpl
to: html
toc-depth: 3
top-level-division: top-level-chapter
trace: false
track-changes: accept-changes
variables: true
verbosity: WARNING
wrap: wrap-auto

@tarleb I'm curious what you think about the idea of exposing opts in templates.

tarleb commented 1 month ago

It seems slightly weird to me, but not too bad, and I'll admit that it could be quite useful.

I like that Opt type is part of the public (Haskell) interface, so any change would require a major version bump, making it reasonably stable and predictable.

Once concern would be performance, but I assume the effect would be negligible. The other is that this has the potential to cause confusion when this feature is combined with custom writers, because then the actual WriterOptions values might differ from the command line values.

bpj commented 1 month ago

@tarleb wrote:

with custom writers, because then the actual WriterOptions values might differ from the command line values.

How so and why? It seems to me that this is something you need to be aware of when creating custom writers. I've read the documentation for custom writers several times while trying to write one for (Perl) Pod, but maybe I missed it anyway.

tarleb commented 1 month ago

Example:

function Writer (doc, opts)
  opts.number_sections = true
  return pandoc.Write(doc, 'latex', opts)
end

If someone now accesses opts.number-sections, then they get only the info about whether --number-sections was defined on the command line, but not about the actual value used for rendering.

bpj commented 1 month ago

Oh, I thought opts was read-only and have been using my non-destructive iterable/table merging/copying function. (It does the actual copying of keys—values with a protected call ignoring any errors, thus effectively ignoring non-iterable arguments, which is meant to be a feature!)

Perhaps opts should be readonly but with a method which returns a copy which is a regular table?

tarleb commented 1 month ago

The PANDOC_READER_OPTIONS and PANDOC_WRITER_OPTIONS variables are indeed read-only because filters should not be able to modify output options. I think it makes sense to have them mutable in custom Reader/Writer functions.

For clarity: I should have used wopts as a parameter name in the writer function to make it clearer that it's distinct from the opts template variable.