problematic recursive block-text (EBNF) definition

GlenDC commented 7 years ago

Currently I'm working on a C# implementation of L20n (FTL), which I do mostly based on the specified EBNF grammar.

TLDR: Could we rewrite the block-text / unquoted-pattern section of this EBNF grammar, to make a block-text not n*2-recursive, where n is the amount of lines defined for a (group?) block-text.

A block-text is defined as NL __ '|' unquoted-pattern and a unquoted-pattern is defined as: unquoted-text | placeable | block-text)+. This is a bit problematic as it is completely recursive in its nature.

Not sure if this is a problem in the way I parse it in C#, or in the way the EBNF definition is setup. But the way I would expect it, is that a block-text such as:

description =
  | a
  | b
  | c

would be simply parsed as 1 block-text, consisting out of 3 unquoted-patterns {a, b, c} (which aren't block-texts on their own. In this case those 3 unquoted-patterns would be 3 unquoted-texts {a, b, c}.

The problem is that because of the current definition, this would instead result in 1 block-text a that has 1 child block-text b, which has 1 child block-text of its own c. This is very recursive, making the parsing go very deep in its call-stacks, and I can imagine that a block text could be big enough to actually get a callstack overflow on runtime.

My C# implementation is written by hand, so in theory I should be able to fix this by refactoring the code, to be not exactly like the EBNF grammar specified, while still getting the same result. However I was thinking to write parsers automatically based on the EBNF grammar for some other languages, and for those there wouldn't be such a solution possible.

GlenDC commented 7 years ago

A possible fix for this issue(?) would be to change the definition of a block-text to the following:

block-text           ::= NL __ '|' (unquoted-text | placeable)+;

GlenDC commented 7 years ago

Looking back at this issue, it might be that recursive definitions are normal for an EBNF grammar. For example an arglist and placeable-list are also defined recursively. While parsers will in most cases parse such cases iterative, as this is more idiomatic for most commonly used programming languages these days.

Is there a good reason this is defined recursively? As there are enough places where definitions are iterative, using the + operator. I don't see why the situations mentioned in this issue aren't defined in a similar fashion.

GlenDC commented 7 years ago

@stasm is this issue in fact valid, or should it be marked as a question and it can be closed with some explanation? If you want I can re-open it in https://github.com/projectfluent/syntax, as that is now the new repository of this project, no?

l20n / spec

problematic recursive block-text (EBNF) definition #13