ietf-tools / bap

An ABNF parser, focusing on human-friendly error messages.
41 stars 28 forks source link

extraneous text in output from parsing RFC2822 #8

Open ghost opened 9 years ago

ghost commented 9 years ago

If you use the online IETF tool to parse RFC2822 the output contains a bunch of text that isn't actually part of any ABNF rule. For example starting on line 21:

No special semantics are attached to these tokens. They are simply single characters.

fenner commented 8 years ago

This is because aex is written assuming that the ABNF is indented further than the document text, as was once the rule. Unfortunately, it's impossible to distinguish prose from valid ABNF - "No special semantics are attached to these tokens" could simply be a sequence of rules concatenated together.

aex remembers the indentation level of a rule definition, but not of the subsequent lines. One obvious heuristic to try would be to remember the indentation level of the subsequent lines, and if the next line outdents significantly, then maybe it's not abnf (especially if it's separated by a blank line).

Note that there's another bug in aex - https://tools.ietf.org/abnf/rfc4627 does not show the end-object at the end of object, since it is outdented to the level of the rule definition and aex assumes that it will be indented further.