paul-kline / bnf-playground

34 stars 7 forks source link

Multiline comments don't work #10

Open rljacobson opened 10 months ago

rljacobson commented 10 months ago

Problem

Multiline comments appear to not work correctly. Example:

/*
This is a multiline comment.

*/
<gpa> ::= "4.0" | <leading> "." <trailing>
<leading> ::= [0-3]
<trailing> ::= [0-9]

Output:

Uhoh, looks like you have an error: Error: Syntax error at line 2 col 0:

1 /*
2 This is a multiline comment.
 ^

Unexpected "\n". Instead, I was expecting to see one of the following:

A character matching /./ based on:
    comment$ebnf$1 →  ● /./ comment$ebnf$1
    comment → comment$string$1 ● comment$ebnf$1 comment$string$2
    rule →  ● comment
    bnf →  ● rule
A "*" based on:
    comment$string$2 →  ● "*" "/"
    comment → comment$string$1 comment$ebnf$1 ● comment$string$2
    rule →  ● comment
    bnf →  ● rule

Solution (maybe)

Presumably somewhere an incorrect assumption is made about whether some regex matches newlines. Here are a couple of possibilities.

It might be a case of the classic gotcha for the behavior of the . regex operator. The docs for JavaScript regexes say:

Note that the m multiline flag doesn't change the dot behavior. So to match a pattern across multiple lines, the character class [^] can be used — it will match any character including newlines.

The s "dotAll" flag allows the dot to also match line terminators.

Alternatively, often a regex for multiline comments has a subexpression similar to [^*] | \\[*] | [*][^/] | [*]$ | … somewhere. If you don't want to think too hard about it, just throw in another alternative matching a newline: [\n\r\f] | [^*] | \\[*] | [*][^/] | [*]$ | …

mxfactorial commented 9 months ago

Presumably somewhere an incorrect assumption is made about whether some regex matches newlines... It might be a case of the classic gotcha for the behavior of the . regex operator.

https://github.com/paul-kline/bnf-playground/blob/main/ts/bnf.ne#L38C12-L38C25

often a regex for multiline comments has a subexpression similar to [^*] | \\[*] | [*][^/] | [*]$ | … somewhere

https://github.com/kach/nearley/issues/42#issuecomment-63849550