Closed mqudsi closed 5 months ago
Thanks for trying owl, and for the clear description of this problem! I'm planning to add support for custom tokens soon (see #2). To support this use case, that ought to include enabling/disabling/customizing whitespace. Then you could parse newlines into their own kind of token and write (newline | ';')
whenever you want a line separator.
The tricky part at that point is ignoring them in certain contexts. One way to do it would be by manually adding newline*
wherever they can appear:
expr = newline* identifier (newline* '|' newline* identifier)*
…but that's inconvenient, and it would be easy to accidentally introduce ambiguities.
Let me think about this some more. I think a feature like
expr = ident ('|' ident)* .ignore newline
which automatically adds the newline*
wherever it needs to go wouldn't be too hard to add.
I'm not sure it's so inconvenient as to be worth changing the grammar language, I arrived at the following independently and it works fine while being easy enough to understand (it even allows empty statements!):
.whitespace ' ' '\t'
program = (';'|'\n')* stmt{(';'|'\n')+, 0+} (';'|'\n')*
Closing this as there seems to be a reasonable solution.
This is a really nice project, I am considering porting a shell scripting language to a formal owl definition and an owl-based parser as I really like the design of this project.
However, one blocker for us is that in scripting languages with an optional terminating semicolon, a line break may be used instead of an explicit
;
. The hard-coded interpretation of\n
as a mere token separator is already a blocking issue here, but even if it weren't, I feel as though it would require a separate class of tokens resting somewhere between whitespace and literal tokens to handle, if I'm not mistaken?In particular, I am looking to parse syntax like
as being equal to
This example can work by replacing all linebreaks with semicolons prior to parsing (feasibility/overhead aside) but that doesn't always work, for example:
as some symbols are allowed to wrap on to the next line without hard-escaping the new line with a backslash, but wouldn't be accepted if it were input as
echo hello |; cat
as that's two separate statements.Any ideas?