What is the scope of the metadata schemas that we're defining?

dhimmel commented 4 years ago

@tarleb thanks for setting this up. I wanted to get an idea of what exactly we're trying to accomplish.

default-pandoc-schema.json currently codifies some of the metadata descripbed in the pandoc manual and supported by the official builtin templates. I think it makes sense to continue to expand this to include all of the fields described in the Pandoc manual or implicitly in its codebase.

However, I believe we also have larger ambitions in terms of creating a metadata schema that is more flexible and featurefull than the current implied pandoc metadata schema. I'm guessing that we'll want to define fields prior to them being officially implemented in pandoc? Perhaps we want to define some fields that never are implemented in Pandoc but are still standardized such that templates / filters can converge on a common vocabulary. So will we have multiple schemas?

@tarleb what're your thoughts and what do you think is the best way to proceed?

tarleb commented 4 years ago

I think we have mostly the same expectations and you pretty much covered it. As the primary goals, I see

arriving at a schema which can be used by authors of scholarly articles and
a schema which contains all information necessary to be consumed directly by manubot, JOSS/whedon, etc.

The first schema is relevant for author convenience, with the second mattering to template writers only. Still, having both fixed would allow for the development of a common tool-chain to transform from the first to the second.

I would expect most results from this repo to be kept separate from main pandoc, although I could see support being added to the official Docker images.

Other ideas, in decreasing order of perceived importance:

schemata for journal information;
filter(s) which convert from the common input format to the common template format;
templates which work with the common template format, at least for HTML, LaTeX, and JATS.
filters to produce the common metadata format from existing metadata intended for consumption by manubot, whedon, ...

A good first step might be to collect some test data which has all the information currently relevant to popular tools and identify common fields. Alternatively, we could start by stepping through the info encodable in JATS article-meta elements and decide which fields we'd like to support. I prefer the example-driven approach, personally.

dhimmel commented 4 years ago

A good first step might be to collect some test data which has all the information currently relevant to popular tools and identify common fields.

Agreed. I mention some existing metadata examples in https://github.com/manubot/manubot/issues/187#issuecomment-567118953. Do we want to do compile these in a GitHub Issue or create .md / .yaml files in this repo with all existing instances we find?

I am not sure what you mean by test data, because we don't intend to support all these implied schemas... but just to create a schema that accommodates the important information.

Alternatively, we could start by stepping through the info encodable in JATS article-meta elements and decide which fields we'd like to support

I feel like JATS meta elements should be one of the examples.

jcolomb commented 3 years ago

ping @crsh and his papaja project. ping also @marton-balazs-kovacs, @alexholcombe for the tenzing app.

pandoc / scholarly-metadata

What is the scope of the metadata schemas that we're defining? #1