jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.53k stars 3.37k forks source link

Proposal: change return value of TextWriters to Doc #6263

Open tarleb opened 4 years ago

tarleb commented 4 years ago

Currently, the Writer type is defined as

data Writer m = TextWriter (WriterOptions -> Pandoc -> m Text)
              | ByteStringWriter (WriterOptions -> Pandoc -> m BL.ByteString)

Given that Doc Text has become a central type, I'd propose to change the type to

data Writer m = TextWriter (WriterOptions -> Pandoc -> m (Doc Text))
              | ByteStringWriter (WriterOptions -> Pandoc -> m BL.ByteString)

I believe that this would allow to handle template application in a more central location, improve composability, and slightly reduce the complexity of the WriterOptions type.

jgm commented 4 years ago

This strikes me as a good idea, but as it's a big API change, it would be good to have more explanation of how it would help. After all, if we make this change, anyone who uses pandoc as a library will have to change their code. Can you go into a bit more detail about the advantages you have in mind?

tarleb commented 4 years ago

Maybe I can just offer my thought process: My ultimate goal is to make pandoc's Lua system powerful enough for me to write a static site generator in it – although I'm not clear yet whether all that should best be integrated into pandoc or become a separate pandoclua project/binary.

SSGs usually compile templates once and apply them repeatedly, so the generator should have access to template functions from Text.DocTemplates. Those functions operate on the Doc Text type, but handling of that type is hidden in the writers. The proposed change would make writers, layouting, and template handling sufficiently composable to support my use-case.[^1]

An interesting side-effect could be that it should become easier for users to implement their own Markdown preferences via a custom writer: I've met multiple people who felt quite strongly about the "right" bullet character and list indentation.

PR #6252 and the hslua-module-doclayout library are just by-products of all this.

[^1]: I know that I could easily convert the writer output back into a Doc literal, but that feels wrong.

tarleb commented 4 years ago

After giving it some more thought, I see now that changing all writers would be overkill. A better alternative could be to export a renderFormat :: WriterOptions -> Pandoc -> m (Doc Text, Context Text) together with writeFormat from each writer module, where writeFormat = toTextWriter renderFormat. No major version change would be required.

An API changing, but less invasive method would be to additionally add a DocWriter constructor to Writer:

data Writer m
  = TextWriter (WriterOptions -> Pandoc -> m Text)
  | DocWriter (WriterOptions -> Pandoc -> m (Doc Text, Context Text))
  | ByteStringWriter (WriterOptions -> Pandoc -> m BL.ByteString)

Combined with a toTextWriter functions, this should keep the required changes in dependent programs at a minimum.

toTextWriter :: PandocMonad m
             => (WriterOptions -> Pandoc -> m (Doc Text, Context Text))
             ->  WriterOptions -> Pandoc -> m Text
toTextWriter f opts doc =
  let colwidth = if writerWrapText opts == WrapAuto
                 then Just (writerColumns opts)
                 else Nothing

      content (d, ctx) = case writerTemplate opts of
                           Nothing  -> d
                           Just tpl -> let ctx' = defField "body" d
                                                $ addVariablesToContext opts ctx
                                       in renderTemplate tpl ctx'
  in render colwidth . content <$> f opts doc