jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.74k stars 3.39k forks source link

[Feature request] submit pandoc commandline options by metadata and/or separate yaml file #2069

Closed bwl21 closed 9 years ago

bwl21 commented 9 years ago

I open this ticket according to the following post in the google group https://groups.google.com/forum/#!searchin/pandoc-discuss/definition$20list/pandoc-discuss/ZKytjti_h2Q/pcagNlG9SdsJ

Summing up:

  1. pandoc should allow to choose at least reader extensions in the document metadata
  2. pandoc should allow to provide options in an extra metadata file provided by e.g.
    --metadatafile <file>
  3. pandoc should allow to customize options even in a default metadata file

In particular the change of definition list is a drastic one. The only way to deal with legacy documents is to enable compact_definition_list at any time. In a given environment there is not even a compatibility mode allowing to support both syntaxes on input and use pandoc as an implicit converter. One must chose the definition list format explicitly. The requested feature would make it easier to maintain legacy documents. For example I have documents which are adapted every other year and got the surprise ...

The metadata could be for example


---
mdreader:
 -   +fenced_code_blocks
 -   +compact_definition_lists
mdwriter:
 -   -backtick_code_blocks
 -   +fenced_code_blocks
 -   +compact_definition_lists
pandocoptions:
 -   --atx-headers
...
jgm commented 9 years ago

+++ Bernhard Weichel [Apr 11 15 08:54 ]:

  1. pandoc should allow to choose at least reader extensions in the document metadata

That's a bit difficult, since we need to know the reader extensions in order to know whether (and also how) to parse the YAML metadata in the first place.

bwl21 commented 9 years ago

isn't yaml_metadata_block enabled by default such that pandoc already parses the yaml metadata block by default? then it should be able to select the extensions accordingly.

So, we could have a hierarchy (decreasing significance). The settings could be merged. In that sense it would not be possible to turn off yaml_metadata_blcok in the document`s metadata block.

jgm commented 9 years ago

+++ Bernhard Weichel [Apr 11 15 23:02 ]:

isn't yaml_metadata_blockenabled by default such that pandoc already parses the yaml metadata block by default? then it should be able to select the extensions accordingly.

I suppose they could be changed mid-parse. But then we'd face something like the Liar Paradox if you specified in the YAML metadata that yaml_metadata_block was disabled! (Not a serious objection -- the way it could work is that, going forward in the document, YAML metadata would no longer be recognized.)

bwl21 commented 9 years ago

Indeed, this could be a problem. Maybe we could resolve the paradox by an extra rule as you propose. We have options like:

As I think of it, the approach raises even more quesions:

lierdakil commented 9 years ago

Frankly, I don't think that mixing document parsing strategy and metadata is a good idea. That said, it may be a nice feature to be able to define exact syntax used in document itself. I suppose we could devise a simple syntax extension allowing to specify exact flavor used for given Markdown document, something akin to vim modeline.

E.g, have first or last non-empty line in a document consist of comment like this, for example:

<!-- pandoc-markdown: +compact_definition_lists -->
or
<!-- pandoc-markdown: markdown_gfm -->

This would be fully compatible and would answer concerns about "Liar Paradox". Exact syntax is obviously debatable, that's just something I thought of from the top of my head.

I don't think specifying writer options/flavor in a document is very useful, unless one is often performing Markdown-to-Markdown transformation. That is not a very common use-case, I think. One case (that I can think of) where md2md transformation would be immediately helpful is combining contributions from different authors, and even then output format is not per-document, but rather should be common for all documents.

Being able to specify default Markdown flavor for both reader and writer (possibly separately) from a settings file could be a useful feature though. It would be a good idea to be able to both specify said settings file location on command line, and to have a default location for it (very much like templates are handled now). Exact format requires some discussion, as that would be nice to be able to set default reader and writer options, not just Markdown flavors in it, I think.

Thoughts?

bwl21 commented 9 years ago

Well, we have the choice of mixing parsing strategy into metadata or to have an entirely different syntax. I am now following the approach of having one style of syntax for all that stuff, which finallly ends up considering parsing strategy as metadata.

For me markdown to markdown conversion is a regular usecase:

therefore I do this all all the time!

lierdakil commented 9 years ago

@bwl21, in any case, I feel like specifying document flavor in a flavor-dependent block is just plain bad design. F.ex., gfm does not support yaml metadata blocks, so you can't specify that it's gfm and keep compatibility with gfm. Same goes for other flavors. It does not make sense to me.

I can't think of a flavor that does not understand HTML comments though, so that's a pro for my idea IMO. Con is it being yet another syntax extension, when we have more than enough already. But at least this one's not disruptive (i.e. will be silently ignored by parsers not supporting it)

lierdakil commented 9 years ago

Oh, and I assume that in your use-case of md2md transformations, output format is not per-document, but rather per-project at least, so using an external config should be more convenient anyway, unless I utterly misunderstand something.

jgm commented 9 years ago

@liedakil raises some good points here. Maybe it would be worth implementing the modeline-like syntax, but I'm not sure. It adds further complexity. If the line is at the beginning of the document, then it's incompatible with pandoc title blocks. So, it would probably have to be at the end. And then it might interact badly with things like references (at least with the current setup, where pandoc-citeproc looks for an empty references header at the end of the document -- this will probably be changed soon).

I have a simple solution to all these problems: Makefiles! Whenever I'm doing anything moderately complex, I just create a simple Makefile, like:

mydoc.pdf: mydoc.txt
<TAB> pandoc $< -o $@ --toc --smart -s -Vversion="1.1" -f markdown-pipe_tables

Then, typing make regenerates the output. This is, essentially, runnable documentation.

lierdakil commented 9 years ago

Makefiles do nicely as a substitution for local config file, that much is true. However, ability to specify flavor in-document does add an extra feature: portability. For the sake of argument, I could send Markdown text to a colleague and not worry about him having trouble converting that. Right now, I'd have to also include part of my Makefile to be sure that everything goes smoothly.

Another point is global config. F.ex., I try to avoid setex headings and simple tables at all times -- it would be handy to be able to disable those globally for writer. At the moment I use a shell alias, but that's cognitive overhead I certainly could live without.

So, this proposal is very rational at its core. Details need some working out though.

jgm commented 9 years ago

+++ Nikolay Yakimov [Apr 13 15 09:21 ]:

Makefiles do nicely as a substitution for local config file, that much is true. However, ability to specify flavor in-document does add an extra feature: portability. For the sake of argument, I could send Markdown text to a colleague and not worry about him having trouble converting that. Right now, I'd have to also include part of my Makefile to be sure that everything goes smoothly.

Yes, I see the point. Though, there are still many things that can go wrong: e.g. your colleague might use the wrong writer options. Attaching a Makefile is more failsafe.

Another point is global config. F.ex., I try to avoid setex headings and simple tables at all times -- it would be handy to be able to disable those globally for writer. At the moment I use a shell alias, but that's cognitive overhead I certainly could live without.

This part could be cured with a global Makefile that you just include in all the others.

So, this proposal is very rational at its core. Details need some working out though.

I agree, it's still worth thinking about.

lierdakil commented 9 years ago

2015-04-13 19:36 GMT+03:00 John MacFarlane notifications@github.com:

+++ Nikolay Yakimov [Apr 13 15 09:21 ]:

Another point is global config. F.ex., I try to avoid setex headings and simple tables at all times -- it would be handy to be able to disable those globally for writer. At the moment I use a shell alias, but that's cognitive overhead I certainly could live without.

This part could be cured with a global Makefile that you just include in all the others.

Not for one-shot conversions though, and bigger projects usually convert from Markdown, at least in my case. You get the point.

bwl21 commented 9 years ago

I plan to solve the issue in my environment (https://github.com/bwl21/wortsammler) such that Wortsammler first reads the metadata of the input and adjust the pandoc command line. This corresponds to the makefile proposal. In addition to this I plan to implement the config hierarchy mentioned before. Due to the non backwards compatible change in definition list, it is even necessary to choose a proper pandoc version :-) which I plan to handle in Wortsammler.

My proposal was indeed to handle the cases where plain pandoc is used "one shot conversions", respectively out of the box editor integrations (e.g. sublime) which act on one single file.

We all agree that the most important part is the configuration of the reader. In my Opinion, metadata is information about data. And the markdown reader configuration is meta information about the current markdown file. So I still think that adding it to the metadata block is still a valid approach. As the metadata block can contain an arbitrary structure, there could be one more reserved entry named "pandoc". I tried it, pandoc preserves the specific entries, so it can interpret them as well.

This is, how I plan to represent this in Wortsammler's config file:

:pandoc:
  :system_command: ! 'pandoc_1.13.1 '
  :markdown_intput_switches:
  - -backtick_code_blocks
  - +fenced_code_blocks
  - +compact_definition_lists
  :markdown_output_switches:
  - -backtick_code_blocks
  - +fenced_code_blocks
  - -compact_definition_lists

If by whatever reason another syntax shall be applied, then I feel that xml processing instruction would be the adequate solution, not xml comments.

<?pandoc mardown-reader="-backtick_code_blocks+fenced_code_blocks+compact_definition_lists" ?>

But yaml metadata appears much more adequate to me.

jgm commented 9 years ago

Oh, by the way, pandoc ignores fields in YAML that end with an underscore. This is designed to allow you to include raw data that you can process with external tools. So you could do:


pandocopts: system_command: ! 'pandoc_1.13.1' markdown_input_switches:

and write a small wrapper script that reads the YAML at the head of the input file, extracts the system options, and runs pandoc using these options on the file.

lierdakil commented 9 years ago

@bwl21, all your proposals are only valid in context of always using parser that supports your proposed extensions. That is not always the case, esp. in the context of multiple authors. XML processing instructions will be parsed by most parsers verbatim, which is certainly not something I would want.

HTML comments are mostly ignored, hence I suggest using them. I feel like that would be the least-disruptive option.

bwl21 commented 9 years ago

I hereby close this issue as I fix it in my surrounding environment as described above.

mb21 commented 6 years ago

Btw. the --metadata-file option is now implemented. For the rest, see https://github.com/jgm/pandoc/issues/4627