sergiocorreia / panflute

An Pythonic alternative to John MacFarlane's pandocfilters, with extra helper functions
http://scorreia.com/software/panflute/
BSD 3-Clause "New" or "Revised" License
500 stars 59 forks source link

Improve doc.metadata #165

Open sergiocorreia opened 4 years ago

sergiocorreia commented 4 years ago

There are a few things where metadata could be improved. For instance:

1. Integers and floats get auto converted to strings:

>>> from panflute import *
>>> doc = Doc()
>>> doc.metadata['spam'] = 42
>>> repr(doc.metadata['spam'])
'MetaString(42)'
>>> doc.metadata['spam'] + 0
Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    doc.metadata['spam'] + 0
TypeError: unsupported operand type(s) for +: 'MetaString' and 'int'

Here, we would expect doc.metadata['spam'] to be the number 42 instead of the string "42".

I don't recall the exact reason for this, but I suspect it's because that's what Pandoc does (which in turn, it might be because of how yaml works?)

>pandoc --from=markdown --to=native --standalone
---
foo: 42
bar: spam
eggs: true
---

foo
^Z

Yields:

Pandoc

    (Meta {unMeta = fromList
        [("bar",MetaInlines [Str "spam"]),
         ("eggs",MetaBool True),
         ("foo",MetaInlines [Str "42"])
    ]})

    [Para [Str "foo"]]

(note the Str "42" part).

2. Metadata uses dict-style instead of properties

It might be good to offer both syntaxes:

doc.metadata['settings']['size'] = 10  # currently supported
doc.metadata.settings.size = 10  # alternative

Not sure if this goes too much against pep20 though...

3. Simplify internals

Currently, MetaString is the same as Str but inheriting from MetaValue (which is an empty class). Maybe we can just use Str and adjust the oktypes of the Meta containers accordingly.

There might also be other internals that can be simplified

dhimmel commented 3 years ago

Integers and floats get auto converted to strings

With pandoc 2.11.3.1 and your example code (slightly modified):

pandoc --from=markdown --to=native --standalone <<< "
---
foo: 42
bar: spam
eggs: true
---
"

Produces the following output:

Pandoc (Meta {unMeta = fromList [("bar",MetaInlines [Str "spam"]),("eggs",MetaBool True),("foo",MetaInlines [Str "42"])]})
[]

Noting that the string stays a string, the bool stays a bool, but the int gets converted to a string. I am glad that bool are kept as bools, since that is helpful for encoding filter options in metadata.

I don't recall the exact reason for this, but I suspect it's because that's what Pandoc does (which in turn, it might be because of how yaml works?)

I couldn't find any pandoc issues on the casting of ints to strings in YAML metadata. It's not part of the YAML spec. If we convert the YAML to JSON using an online converter:

{
   "foo": 42,
   "bar": "spam",
   "eggs": true
}

Note 42 is an integer.

Here, we would expect doc.metadata['spam'] to be the number 42 instead of the string "42".

Given this matches how Pandoc handles it, it might actually be the best behavior, but yes it feels annoying. Would be good to get more information on why Pandoc is converting ints to strings in YAML metadata.

sergiocorreia commented 3 years ago

. Would be good to get more information on why Pandoc is converting ints to strings in YAML metadata.

Agreed! Maybe this is related? https://github.com/jgm/pandoc/issues/5479#issuecomment-489145048