jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.63k stars 3.38k forks source link

Suggestion: TOML support in page metadata #2425

Closed 01AutoMonkey closed 9 years ago

01AutoMonkey commented 9 years ago

See:

From Intro to TOML:

TOML stands for Tom’s Own Minimal Language. It is a configuration language vaguely similar to YAML or property lists, but far, far better.

Projects I know to be using TOML:

Haskell TOML implementations (htoml seems to be the best one):

technocrat commented 9 years ago

Maybe put this off a while? From https://github.com/toml-lang/toml

"Latest tagged version: v0.4.0.

Be warned, this spec is still changing a lot. Until it's marked as 1.0, you should assume that it is unstable and act accordingly."

01AutoMonkey wrote:

See:

From |Intro to TOML|:

TOML stands for Tom’s Own Minimal Language. It is a configuration language vaguely similar to YAML or property lists, but far, far better.

Projects I know to be using TOML:

Haskell TOML implementations (htoml seems to be the best one):

— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/2425.

01AutoMonkey commented 9 years ago

Yeb, you're right, lets reconsider this once TOML reaches 1.0 and Haskell implementations mature.

Merovex commented 4 years ago

Not a fan of TOML, but was looking at it's support in Pandoc. As of April 2020, TOML appears to be 1.0.0-rc1. It may almost be time.

https://github.com/toml-lang/toml

ickc commented 4 years ago

I agree TOML is getting more popular in cases where people would use YAML in the past. But should this goes through pandoc-discuss first?

alerque commented 4 years ago

No. Too much pain for no gain.

YAML front-mater and book meta data is very widespread with hundreds of implementations supporting it. There is no way at all it could be removed and replaced with TOML even if TOML had the same capabilities, which is does not. YAML is far more flexible on the kinds of data that can be wrapped and inserted in various ways, TOML is much less capable in this regard.

That is of course also TOML's strong suit: The limited syntax means it's much more predictable and less surprising, and hence easier to parse and support. That makes it a very interesting format to consider for new applications that need small amounts of meta data. But given that extracting YAML support from the Markdown ecosystem is pretty much inconceivable at this point, all you could possibly do is add TOML support in addition to everything else. This has the opposite effect from simplicity and predicablility, it fragments the ecosystem and adds yet another thing you need to parse to handle Markdown.

I think Pandoc and other major implementations should steer clear of bolting new things like this on and adding to the complexity when existing solutions are just as (or in this case, more) capable.

mb21 commented 4 years ago

given that extracting YAML support from the Markdown ecosystem is pretty much inconceivable at this point, all you could possibly do is add TOML support in addition to everything else.

I tend to agree... I'm sympathetic to TOML, but lots of markdown parsers that support YAML front matter, none that support TOML...

jgm commented 4 years ago

YAML fits Markdown better, because it looks more like plain text. You don't need quotes (often) or double brackets.

ickc commented 4 years ago

I thought the request is about adding TOML, not replacing YAML?

Also, it seems to me that TOML’s popularity is (among others) because it addresses some of the shortcomings of YAML that it is too flexible Eg in type casting, and in case of pyyaml it requires a lot of Python specific type information to cast into the right object. Can you elaborate why you think adding TOML support in pandoc would be beneficial?

Merovex commented 4 years ago

My only point was that I happened by this issue and saw a statement that when TOML reached 1.0 y'all would re-discuss adding TOML support. I'm not asserting here as the correct venue, etc. I'm not saying TOML should be supported (I probably wouldn't use it). Just hitting the tickle since the decision in 2015 was to discuss at TOML 1.0. It's at RC now. So, if I hit a hornets nest, my apologies.

Devil's Advocate

Okay, maybe it does look like I'm supporting TOML. But, I'm not. Rather, I am pointing out that a little more research may be warranted to get objective data about whether to discount TOML altogether. Playing Devil's advocate.

The advantage to TOML (so they claim) is it is more legible to non-programmers. There are not multiple layers of indentation. It's more secure, requires explicit typing, etc. The Rust Community seemes enamored with TOML for these reasons. I doubt if any of these reasons are in Pandoc's interests. (In some of the Rust talks I've seen, there's hint of a rivalry with Haskell, but probably not reciprocated.)

A Possible Use Case

Were I to ask for TOML support, it would support carving out variable sets from one another. In my specific use case, I have a YAML for an entire novel, which comprises dozens of Markdown files as scenes. When I try to use YAML in those files, the last YAML wins...which has led to some hilarious collisions. I found this thread because I was contemplating using TOML as a simple way of preventing the collisions (using them in the sub-files). Pandoc could carve out a namespace that would insulate the larger document variables from the sorts of silly things users like me do.

lang: de-CH
title: "Ein schöner Titel"
subtitle: "ein wundervoller Undertitel"
author: "Petra Muster"
date: 30-06-2018

becomes

[pandoc]
lang = "de-CH"
title = "Ein schöner Titel"
subtitle = "ein wundervoller Undertitel"
author = "Petra Muster"
date = "30-06-2018"

However, that would imply a change internally to YAML-based metadata. This inverts my solution. So, it's not a strong argument.

No Markdown Processor Support?

As for the observation that there are no Markdown parsers that support TOML Freecontent says:

Hugo supports three languages for providing metadata – YAML, TOML and JSON.

According to built with, Hugo is 11th most popular, followed by Jekyll at 14th. I would argue Jekyll is under-reported since it is the GitHub page processor. However, the point is more that Hugo is a very popular Markdown parser (static site generator) that uses TOML as well as YAML. It appears Gatsby also supports TOML since 2016. There are various non-static generators that include support for both.

jgm commented 4 years ago

If I were starting from scratch, I'd be tempted to explore TOML. (Just being easier to parse than YAML is potentially a big advantage.) But given that we already support YAML, and that's pretty deeply embedded, we'd realistically be talking about adding not replacing. I don't see any real advantages there. It means adding even more code complexity, more library dependencies, etc.

"last YAML wins" -- not sure what you mean, but this sounds like an issue with the way pandoc has decided to resolve conflicts when you use the same metadata field twice, rather than an issue with the format itself.

Merovex commented 4 years ago

"last YAML wins" -- not sure what you mean, but this sounds like an issue with the way pandoc has decided to resolve conflicts when you use the same metadata field twice, rather than an issue with the format itself.

Yes. Not an issue with the format, and not something to discuss/resolve here (different issue from TOML discussion). If I have 50 files all having "title" as the title for the file in the YAML, and Pandoc uses that in the LaTeX template. It's an annoyance, nothing for y'all to fix.

What I do is pre-compile them into one big file, and when there are 50 yaml sections Pandoc seems to merge them so the latter ones overwrite the former ones. This is the same behavior as CSS, so not a surprise.

jonassmedegaard commented 2 years ago

I use Pandoc as a translator between markups - sometimes even translator between different flavors of Markdown.

One concrete usecase that I have is to migrate websites over to zola which happens to use TOML for metadata. Sure, I can try throw some unique perl spaghetti together, but that really reminds me of... the old dark ages before pandoc!

Please consider adding support for readong and writing TOML metadata (without getting rid of YAML - just as you are not getting rid of one table parser or serializer just because another gets covered).

Merovex commented 2 years ago

I don't really have a dog in the TOML fight per se. However, if looking to add would it help to show an exemplar?

Hugo (static site generator) allows YAML, TOML and JSON for frontmatter. I'm adding the URL below for consideration, but the key distinction is they use three plusses +++ to denote TOML while retaining the dashes --- for YAML.

https://gohugo.io/content-management/front-matter/

tarleb commented 2 years ago

I'm procrastinating and wondered if it would be difficult to support TOML by using a custom reader. Turns out it's fairly easy:

local TOML = require 'toml'

function Reader (sources, opts)
  local input = tostring(sources)
  local toml, rest = input:match '^%+%+%+\n(.*\n)%+%+%+\n(.*)'
  local doc = pandoc.read(rest or input, 'markdown', opts)
  doc.meta = toml and TOML.parse(toml) or doc.meta
  return doc
end

This will require lua-toml to be installed, e.g., via luarocks, or just by downloading toml.lua directly into the document directory.

jonassmedegaard commented 2 years ago

Neat!

It seems doing the opposite should be possible similarly - i.e. serializing with metadata as TOML.

@tarleb if you feel like taking that challenge I'd appreciate it - otherwise I might try myself, but last I attempted to write even a short Lua script I go lost in some rabbit hole...

tarleb commented 2 years ago

Surprisingly, converting to TOML is a good bit harder. There's a draft below, but it doesn't quite work the way it should: looks like there's a bug in the Lua TOML encoder. (Requires the current dev version of pandoc, using a new-style custom writer.)

local TOML = require 'toml'
local type = pandoc.utils.type

local function markdownify (metavalue)
  if type(metavalue) == 'List' then
    return metavalue:map(markdownify)
  elseif type(metavalue) == 'Blocks' then
    return pandoc.write(pandoc.Pandoc(metavalue), 'markdown')
  elseif type(metavalue) == 'Inlines' then
    return pandoc.write(
      pandoc.Pandoc(pandoc.Plain(metavalue)),
      'markdown'
    ):gsub('^%s+', ''):gsub('%s+$', '') -- trim whitespace
  elseif type(metavalue) == 'table' or type(metavalue) == 'Meta' then
    local result = {}
    for k, v in pairs(metavalue) do
      result[k] = markdownify(v)
    end
    return result
  else
    os.exit(1)
    return tostring(metavalue)
  end
end

function Writer (doc, opts)
  local toml = string.format(
    '+++\n%s\n+++\n',
    TOML.encode(markdownify(doc.meta))
  )
  return toml
end