Closed hipitihop closed 4 years ago
Those aren’t valid EDN: https://github.com/edn-format/edn#integers
Yes, HEX notation is not defined by the spec - only "plain" integers. There are edn parsers that work with hex, though. Being liberal like that is good, but unless the communicating parties have agreed to that, they cannot claim that they talk edn. One should properly support hex through an extension.
Yeah, this is why I dislike the EDN format from a parser implementation perspective: there’s no unique spec for it.
Anyway, do you want to try and make a pull-request about this? Integers are parsed here: https://github.com/swaroopch/edn_format/blob/0d4c2136663b57bef913b321313f2651ca6649a3/edn_format/edn_lex.py#L217-L222 You can add tests here: https://github.com/swaroopch/edn_format/blob/0d4c2136663b57bef913b321313f2651ca6649a3/tests.py#L93-L95
Yeah, this is why I dislike the EDN format from a parser implementation perspective: there’s no unique spec for it.
There is one unique spec. It is defined here: https://github.com/edn-format/edn
What implementations allow for, doesn't make the spec less of a spec - it makes the implementation more liberal. This is fine technically, but it does confuse reasoning about what is "actually" supported. This is solved very easily by reading the spec (which is short and simple). Technically, this could be solved with "strict" (or "lax") flag on the parser.
If we want to make edn_format more liberal, to accept integers in hex format, then this should be a separate token type (to easily use it in "lax" mode, or ignore it in "strict" mode), ie t_INTEGER_HEX
. This will also make it easier to reason about the regex.
btw, edn_format
already supports t_RATIO
which is not defined in the spec.
By "spec" I meant something strict like a BNF grammar. The EDN document starts with:
Currently this specification is casual, as we gather feedback from implementors. A more rigorous e.g. BNF will follow.
We’re left with either that document or Clojure’s reference implementation. However, Clojure doesn’t respect that spec: it supports things that aren’t spec’d (like ratios https://github.com/edn-format/edn/issues/64 or NaN
/Inf
https://github.com/edn-format/edn/issues/2) and rejects things that are (https://github.com/edn-format/edn/issues/68).
With no update to that document since 2014, parser implementers are left alone.
@swaroopch It’s your call, your’re the owner 😃 What do you want to do here? Should edn_format
support that notation?
The flag solution may be interesting, but it significantly complicates the code because it means dynamically changing the grammar.
I think it makes sense to support t_RATIO
because it allows to represent some data that has no other representation (think 1/3
). It’s a native type in Clojure and as such is dumped in its own representation when one uses pr-str
. On the other hand, hexadecimal notation never occurs in dumped data because it’s just a different representation for an integer:
(pr-str 1/3) ; => "1/3"
(pr-str 0xFF) ; => "255"
@hipitihop @c00kiemon5ter @bfontaine Thanks for the great discussion on this topic 👍🏼
My inputs are:
Hex notation e.g.
0xFFDC73
causes traceback:Sample EDN:
pip details: