jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.6k stars 3.38k forks source link

the documentation about metadata is a bit fragmented #4584

Open danse opened 6 years ago

danse commented 6 years ago

The documentation about metadata is currently part of a section about Pandoc's markdown while it could be a dedicated section with info about readers and writers in order to avoid duplication. Here i express some doubts about the handling metadata are supposed to receive, while here @jkr has doubts about how to document metadata handling for the DOCX writer.

Metadata are part of the data model, and supported to various degrees by the different readers and writers, therefore i would suggest to document them in a section where the intended semantic is explained first, followed by sections about metadata handling in specific readers or writers.

mb21 commented 6 years ago

I tend to agree... part of the story is also in the templates section.

Hopefully, I get to work on https://github.com/jgm/pandoc/issues/1960 soon. Then we could consider moving the Metadata blocks extension section from the markdown extensions to the general extensions part...

See also the wiki for some further links to potential changes in the future...

Zack-83 commented 6 years ago

It would be very helpful to have a comparison table showing the fields which can be traded between different formats, with particular regard to metadata information. That would greatly help to make it clear, what Pandoc presently can and what it cannot, and which "granularity" is provided.

Let me begin with what I learnt so far:

--------------------------------------------------------------------------------------------------------
HTML                       LaTeX             DOCX (style or position) DOCX (doc properties) Dublin Core
-------------------------- ----------------- ------------------------ --------------------- ------------
<title>                    \title            1st position             title                 dc:title

<meta name="author">       \author           2nd position             author                dc:creator

<meta name="dcterms.date"> \date                                      date (overwritten)    dcterms:date

<meta name="abstract">     \begin{abstract}  3rd position                      

                           \tableofcontents  4th position

<body>                     \begin{document}

<h1>                       \section          Header 1

<h2>                       \subsection       Header 2

<br>                       \newline

<p>

<em>                       \emph

<strong>

<table>                    \begin{longtable} Table

etc.
-------------------------------------------------------------------------------------------------------

Does anybody have a more complete table? Or who would like to contribute to write one?

danse commented 6 years ago

@Zack-83 it's a good initiative, but i am afraid that maintaining such a table would be too demanding and it will end up getting outdated. Also the relational, tabular structure does not suit the actual data model used within pandoc, which is based on the native format, and readers and writes from and to every supported format.

About this, i admit that i have some doubts about the data model for metadata. As far as i understood skimming through Definitions.hs in pandoc-types, there is no fixed set of metadata: they are simply strings that are managed by convention. Do we want to turn them into a defined set?

mb21 commented 6 years ago

btw, you can see which are used how with e.g. pandoc -D latex

jgm commented 6 years ago

Francesco Occhipinti notifications@github.com writes:

About this, i admit that i have some doubts about the data model for metadata. As far as i understood skimming through Definitions.hs in pandoc-types, there is no fixed set of metadata: they are simply strings that are managed by convention. Do we want to turn them into a defined set?

Having a data structure with types for the metadata would be good in lots of ways. It would then be clear which metadata fields writers and readers ought to support. On the other hand, this would reduce flexibility for users to define their own metadata. Perhaps a compromise would be to have some hard-coded "must-support" fields, and a map of optional fields. (Indeed, we used to have hard-coded title, author, and date -- and that's all -- before we added YAML metadata. At that point it seemed silly to keep the hard-coded fields, but maybe that was the wrong decision.)

Perhaps this should be discussed on pandoc-discuss or a new issue. In any case, this issue isn't the right place to discuss this further.