jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.69k stars 3.39k forks source link

Plain TeX writer #1541

Open jgm opened 10 years ago

jgm commented 10 years ago

I want to explore the possibility of adding a plain tex writer---one whose output can be processed by plain tex (or perhaps eplain), without latex or context macros.

What I have in mind is to have pandoc emit macros that are fairly closely customized to pandoc's own structural elements, and include definitions of these macros in the preamble of the default template. Then users could modify these macros to get the appearance they want.

mszep commented 10 years ago

I had thought about exactly this concept a little while ago, but I couldn't come up with a use case for this which isn't already served well by ConTeXt and LaTeX.

Since both of the above systems provide advanced functionality for specifying layout and apprearance, I think tweaking a hypothetical TeX template file to get the desired appearance would take the user much longer, if they don't already know raw TeX.

Do you have a specific application in mind where the extra low-level control would be worth it?

jgm commented 9 years ago

One advantage, besides customizability, is that only a very minimal tex install would be needed to produce a PDF.

mszep commented 9 years ago

I've been thinking a bit more about this lately, and getting more interested.

I think my earlier question is missing the point; since any document can be typeset by any of the systems (plain TeX, ConTeXt or LaTeX).

The reduced dependency is nice when installing, but also when authoring, since it would bypass the clunky LaTeX and ConTeXt systems. Plain TeX might turn out to be a better match for pandoc's document model then the two higher-level macro packages.

mb21 commented 9 years ago

Pandoc now has PDF generation using ConTeXt built in as well. ConTeXt Standalone states "ConTeXt macro files are small (less than 10MB), but the suite comes with various free fonts which considerably increase the size of the distribution to around 200MB)." There's also BasicTeX which comes in at around 100MB (which, however only contains two fonts and no bidi package etc). How does plain TeX handle unicode and bidirectional text btw?

mikeshulman commented 7 years ago

I have a use case for a pandoc plain tex writer: conversion of LaTeX snippets to Plain TeX snippets for sharing bits of code (in my case, math homework problems) between people who use different dialects of TeX. Is it ever likely to happen?

shreevatsa commented 6 years ago

Curious about this: if I understand correctly, what it would take to write a plain TeX writer is to emulate src/Text/Pandoc/Writers/LaTeX.hs and src/Text/Pandoc/Writers/ConTeXt.hs (currently 1501 lines and 546 lines long respectively).

  1. Is there some guide on how to do this (e.g. a complete description of Pandoc's internal representation, and what are the kinds of data that need to be “translated”)?

  2. Would the writer have to be written in Haskell, or is it just the preferred convention and (like filters) is there a possibility of being able to use some other language (say Python or Lua)?

mb21 commented 6 years ago

@shreevatsa yes, we'd have to add a Writers/PlainTeX.hs in Haskell, since all the writers (as the rest of pandoc) is written in Haskell.

Is there some guide

Have a look at http://pandoc.org/CONTRIBUTING.html

Pandoc's internal representation

and https://github.com/jgm/pandoc-types/blob/master/Text/Pandoc/Definition.hs

However, before all this, we'd have to agree on what TeX exactly this plain writer would produce. That's what this issue is for... suggestions welcome :) At least it should be able to handle test/writer.native

shreevatsa commented 6 years ago

Thanks. I think @jgm already outlined a sensible approach in the issue description:

What I have in mind is to have pandoc emit macros that are fairly closely customized to pandoc's own structural elements, and include definitions of these macros in the preamble of the default template. Then users could modify these macros to get the appearance they want.

So for example the metadata ("date",MetaInlines [Str "July",Space,Str "17,",Space,Str "2006"]) (from writer.native) may turn into \date{July 17, 2006} and then there would be a definition of \date (or how to use the token list / boxes set by \date) in the preamble etc. We'd be re-implementing small bits of LaTeX/ConTeXt/eplain/opmac etc., which work similarly.

jgm commented 6 years ago

+++ Shreevatsa [Jan 18 18 16:38 ]:

So for example the metadata ("date",MetaInlines [Str "July",Space,Str "17,",Space,Str "2006"]) (from writer.native) may turn into \date{July 17, 2006} and then there would be a definition of \date (or how to use the token list / boxes set by \date) in the preamble etc. We'd be

Well, not exactly. It would be more generic than this. You wouldn't want every metadata field to turn into a command of the same name.

Maybe: \metadata[date]{July 17, 2006}

Witiko commented 6 years ago

As discussed in #4341, the witiko/markdown TeX package is an (unintentional) implementation of this idea (see an article introducing the package in TUGboat vol.39, no.2). The first step is to decide and document the TeX macros that will correspond to the individual elements of the AST; see section 2.2.3 of the witiko/markdown documentation to see the choices made by the package. Most importantly, the macros need to be prefixed (e.g. \pandocMetadata rather than \metadata) if Pandoc wants to co-exist with other TeX packages.

Witiko commented 6 years ago

Below are some of my assorted thoughts on this:

brainchild0 commented 4 years ago

I would not wish to discourage anyone from attempting this work who might be enthusiastic about it, but for those attempting optimally to delegate development resources, I share some thoughts on the subject.

jgm commented 4 years ago

The original intent of this issue was not to reduce dependencies but to enhance customizability. The emitted TeX would match as closely as possible pandoc's own document model, and all formatting would be done by macro definitions.

brainchild0 commented 4 years ago

The original intent of this issue was not to reduce dependencies but to enhance customizability.

I see. The intention of #5879 and #5880. were to make the LaTeX output customizable. This approach preserves the benefits of LaTeX's classes and packages without loss of customization options.

jgm commented 4 years ago

If we generate tex that matches the pandoc AST, one could always use LaTeX to define the macros and process the result with pdflatex. In a sense it would be generic tex -- you supply the macro definitions, which could be in plain tex or latex.

brainchild0 commented 4 years ago

Would you not lose then the layout features and the macros provided by a document class?

jgm commented 4 years ago

Would you not lose then the layout features and the macros provided by a document class?

I don't see why. The document class is specified in material that goes in the template; it's not generated by the latex writer currently. You could still use a template of your choice. The template would have to provide macro defs for all the pandoc commands.

Witiko commented 3 years ago

@jgm As discussed in my yesterday TUG 2021 talk, there is an effort by @drehak (see drehak/lunamark) underway to produce a writer that would convert Pandoc's AST to the TeX AST input (see the spec) of the witiko/markdown package. We would distribute the writer with witiko/markdown, as discussed in #4341, and then use the writer from the \pandocInput TeX command to typeset any document format that Pandoc can read and keep full control over the formatting. However, there are two caveats to using Lua:

  1. The Lua writer would be located in the TeX directory structure, where it's difficult to find by Pandoc. We can get around this by spawning the Lua writer in the current working directory when needed. This is feasible but convoluted.

  2. For plumbing, it would be useful to have a TeX AST reader as well. However, there is no concept of a Lua reader in Pandoc. If we'd like to use Pandoc's Lua interpreter, then we'd likely have to abuse RawBlock to perform a no-op conversion from the TeX AST to Pandoc's AST and perform the parsing in a Lua filter. This is feasible but convoluted.

This leads me to the conclusion that the best way forward in the long run would be to add a Haskell reader and writer for the TeX AST format of witiko/markdown to Pandoc. Would you merge such a contribution?

jgm commented 3 years ago

Where is the TeX AST format of witiko/markdown documented, exactly? I'd like to take a look.

Witiko commented 3 years ago

There is a specification in the Token Renderers section of the user manual (HTML) and the technical documentation (PDF).

jgm commented 3 years ago

Not crazy about the markdownRendererImage style names. After all, pandoc isn't limited to converting from markdown. These are generic elements that are supported in many formats. I'd be more likely to go for something generic like Image.

Witiko commented 3 years ago

Not crazy about the markdownRendererImage style names. After all, pandoc isn't limited to converting from markdown. These are generic elements that are supported in many formats.

The \markdownRenderer… prefix determines the provenance (the Markdown TeX package) rather than the language.

I'd be more likely to go for something generic like Image.

Having shorter macros such as \Image will interfere with commands defined by TeX formats, packages, and users. Therefore, some namespacing will be required to be good neighbors with the preexisting TeX ecosystem.

The namespacing does not need to be the \markdownRenderer… prefix: We can set up arbitrary TeX commands with e.g. the \pandoc… prefix and with ~1:1 correspondence to Pandoc's AST. I can then independently map them M:N to my \markdownRenderer… commands in witiko/markdown.

silby commented 3 years ago

What I have in mind is to have pandoc emit macros that are fairly closely customized to pandoc's own structural elements, and include definitions of these macros in the preamble of the default template. Then users could modify these macros to get the appearance they want.

I have always assumed this would mean emitting a bunch of macros named like \pandocFoo or even \pdcFoo if you want it to be a little shorter. Doesn’t seem like it would be too annoying to deal with the namespace, especially if you assume part of the point of emitting plain TeX macros from Pandoc is that you’re not going to be hand-editing the TeX all that much.

brainchild0 commented 3 years ago

I would be curious to understand how fully these proposed efforts may offer a foundation for expanding the LaTeX writer toward greater support for document abstractions (discussed earlier). Certainly, it would be valuable that any improvements would open opportunities of such kind.

Witiko commented 2 years ago

@jgm To give you an update, @drehak and I have since written a white paper (in Slovak, here is a machine translation to English) that discusses how the elements of Pandoc's AST can be mapped to the elements of the Markdown package for TeX. We have also produced a proof of concept that uses a Pandoc Lua Writer to convert any document understood by Pandoc to generic TeX, which can then be typeset using the Markdown package for TeX.

We plan to fully implement the Lua writer and the accompanying package for TeX and describe them both in detail in a TUGboat article that would appear in March. We will share the preprint with you when ready. Since the Markdown package supports plain TeX, LaTeX, and ConTeXt, Pandoc could then reduce some of its maintenance costs and receive support for plain TeX by replacing its writers for ConTeXt and LaTeX with a single writer that would produce generic TeX. This would be to our mutual benefit, because we could in turn stop shipping and maintaining our Lua writer for generic TeX.

jgm commented 2 years ago

@Witiko excellent, I look forward to hearing more about this in a couple months!

Witiko commented 2 years ago

@jgm @drehak In Section 2.3 of our TUGboat 43:1 article preprint, we give an example of how our proof of concept can be used to directly typeset and style any document format understood by Pandoc in TeX:

2.3 Integration with Pandoc

Pandoc is a tool for converting between dozens of document formats. In our proof of concept, we integrate Pandoc with the Markdown package, so that we can typeset and style any document format understood by Pandoc directly from TeX.

To give an example, we have prepared a manual page wolf.1 in the roff language:

.TH WOLF "1" "2022-04-01" "wolf 1.0.0" "User Commands"
.SH NAME
wolf \- tool for befriending and scaring grandmas
.SH SYNOPSIS
.B wolf
[\fB-b\fR|\fB--befriend\fR]
[\fB-s\fR|\fB--scare\fR]
<\fIgrandma\fR>

Here is how we would typeset our manual page:

\documentclass{article}
\usepackage{pandoc-to-markdown, emoji}
\markdownSetup{
    renderers = {
        headingOne = {%
            \section*{\emoji{wolf} #1}%
        },
    },
}
\begin{document}
\pandocInput[format=man]{wolf.1}
\end{document}

Output:

🐺 NAME

wolf - tool for befriending and scaring grandmas

🐺 SYNOPSIS

wolf [-b|--befriend] [-s|--scare] <grandma>

Our proof of concept consists of a Lua writer that produces TeX commands corresponding to the abstract syntax tree of Pandoc and a TeX package that maps these commands to the renderers of the Markdown package. A rewrite of our Lua writer in Haskell will be offered as a basis of the upcoming plain TeX writer for Pandoc.