jgm / djot

A light markup language
https://djot.net
MIT License
1.67k stars 43 forks source link

Metadata #35

Open jgm opened 2 years ago

jgm commented 2 years ago

Should there be a built-in format for metadata, or should that be considered distinct from the markup syntax?

If so, what?

Do we need structured keys such as YAML provides? Would be nice to avoid the complexity of YAML, but otherwise YAML is nice for this. Maybe some simplified subset of YAML.

jgm commented 1 year ago

The drawback of attribute syntax is that attribute values are just plain strings. Metadata like title and abstract often contain formatting, so it would be nice if they were regular djot syntax.

toastal commented 1 year ago

Abstract, backstory, correction, epilogue, prologue

These are elements that seem they should be in the contains-formatting category (there’s likely a few more). Much of the other elements would be inside something like <meta name="x" content="y">. In these cases, formatting doesn’t make sense. I’m not sure <title> allows including elements.

jgm commented 1 year ago

Titles can contain emphasis (e.g. italicizing a title), superscripts and subscripts, and math, for example.

toastal commented 1 year ago

Finally not too lazy to look at the <title> spec, it’s content model is text, not flow content, so it shouldn’t have any other elements inside it. Are you meaning headlines?

emilazy commented 1 year ago

HTML is not the only format in the world and Djot should not be restricted to its lowest common denominator. Many real-world document titles contain formatting that has to be mangled to fit into an HTML <title>.

toastal commented 1 year ago

Ah, I see the argument you are making now.

pranabekka commented 11 months ago

The title is the most prominent piece of data about the document (if any). It should not come after other metadata, nor should it require a metadata block (inside metadata).

In "An INI Critique of TOML" [0], the author differentiates between serialisation formats and configuration formats. Metadata in djot is not for serialising and sending data. This makes JSON, TOML, and similar formats unsuitable.

Also, using an external/existing format adds the cost of new syntax, as well as more parsing code, or a library. Figuring out the right subset of YAML will still require additional syntax to remember, as well as a new parser. External formats also have the downside of not (natively) supporting djot markup. Also, trying to support multiple external formats sounds like way too much overhead.

Overall, I think a metadata block using a definition list is the way to go.

0: https://github.com/madmurphy/libconfini/wiki/An-INI-critique-of-TOML

nbonfils commented 4 months ago

Has there been any progress on this discussion? I am currently considering picking djot for a project of knowledge base creation (like a wiki), and there metadata are a must (like tags, author, date etc..).

I feel that @pkulchenko's suggestion is the nicest one because it feels the most native to djot with attributes, simply making them available to the document by having either a blank line after them or we could imagine something like an optional placeholder document element like +++ was suggested at the begining of the thread. Like so:

{attr1="bar and\
 baz"
 .clssy
 attr2=more}
{updated=20230801 attr2=less}
+++

# title

{source="personal-experience"}
> More than three people on one
> bicycle is *not* recommended.

I think this could present a start for document-level attributes aka metadata, and then as for @jgm concerns, being able to introduce djot syntax for title document attribute, there needs to be a clarification on wether there are other attributes or metadata that require this as well. If yes, then some time needs to be spent on extending the attribute syntax. If no, maybe a special syntax for titles would work, maybe expanding on my +++ proposal for a placeholder element, any text after could constitute a title. Like +++ My _emphasized_ document title.

terefang commented 4 months ago

sorry for the late chime in.

reading the plethora of messages and opinions so far got me thinking.

Format No1 — the proposed default:

# Pandoc User’s Guide

* :Author: John MacFarlane
* :Author: Johnny MacFarlane
* :Date: August 22, 2022

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus malesuada rhoncus lorem in fringilla. 

Format No2 — the extension:

+++ [type]
* :Title: Pandoc User’s Guide - A Manual
* :Author: John MacFarlane
* :Author: Johnny MacFarlane
* :Date: August 22, 2022
+++

# Pandoc User’s Guide

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus malesuada rhoncus lorem in fringilla. 

i am using the +++ fencing here but @jgm may decide otherwise.

uvtc commented 4 months ago

In case it adds anything useful to this discussion, I added a comment here https://github.com/jgm/djot/discussions/293#discussioncomment-9208124 that references this issue.

yurivish commented 1 month ago

Random naïve thought from a passer-by: what if djot's attribute syntax had a way to specify attribute values as verbatim text or a code block?

I think this would address the concern in https://github.com/jgm/djot/issues/35#issuecomment-1302249910.

Apologies if this idea has been suggested before; I scrounged around but couldn't find any discussion on it.

terefang commented 1 month ago

btw, what is the status of this ?

jgm commented 1 month ago

We're still in the brainstorming phase. Nothing has been decided.