Open darmie opened 7 years ago
Hi,
I think I see part of the problem. In the 'Prog' non-terminal, you're only accepting one of either Class, Import or Namespace. Might want to change that to + or *, depending on whether source files are allowed to be empty.
Also, the Ws at the end of Prog could be moved to the end of Import. Class and Namespace end in Block which already consumes whitespace.
Perhaps Prog is redundant, and the contents could be moved inside Dublin, but I don't know what other features you might have planned.
Thanks for your quick response, I am still learning the Waxeye grammar :).
I would try your suggestion. 👍
I have another question, I am not sure if it's meant to be opened as a separate issue. How do I perform an INDENT
and DEDENT
for a code block.
let's say I have a function
private myFunc()=>
//this is my block
print("Hello World")
//call functiom
myFunc()
If I understand correctly, programming languages that use indentation for code blocks are context-sensitive, so can't be parsed by a purely PEG-based parser.
It could be worth doing some web searching to find out what solutions others use. The two I'm aware of are to preprocess the input with a tokenizer, inserting special INDENT and UNINDENT tokens in place of whitespace, or to extend the grammar language to allow context-sensitive information to be recorded while parsing.
I planned on implementing context-sensitive parsing, but never ended up doing it.
Interesting. I have been looking up ANTLR grammars for languages like Ruby and Python with hopes of getting a clue how it's done, I really wish this is possible with Waxeye.
On Fri, 18 Aug 2017, 22:02 Orlando Hill, notifications@github.com wrote:
If I understand correctly, programming languages that use indentation for code blocks are context-sensitive, so can't be parsed by a purely PEG-based parser.
It could be worth doing some web searching to find out what solutions others use. The two I'm aware of are to preprocess the input with a tokenizer, inserting special INDENT and UNINDENT tokens in place of whitespace, or to extend the grammar language to allow context-sensitive information to be recorded while parsing.
I planned on implementing context-sensitive parsing, but never ended up doing it.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/orlandohill/waxeye/issues/44#issuecomment-323460102, or mute the thread https://github.com/notifications/unsubscribe-auth/AAzwgprLFXXLOc4mBFvTOG9tMfdzPI2Uks5sZfv0gaJpZM4O72YI .
One possibility (I am using it) is to parse the string with multiple passes. With waxeye it can be e.g. done by naming your NTs with special prefix/postfix. In the end it can look like:
# instead of
Function <- ?Ident ?(:'(' ?Params :')') Col ?Type RArrow Block
# it becomes
Function <- ?Ident ?(:'(' ?Params :')') Col ?Type RArrow ('\n' Indent (!'\n' Nextpass)*)*
Function_Nextpass <- Block
# and to store the substring for further parsing:
Nextpass <- .
Effectively the parsed result would then give an ast with 'Nextpass' nodes. Those need to be flattened and then parsed again starting from NT "Funtion_Nextpass", until there are no 'Nextpass' nodes in the ast anymore.
It was a way for me to implement parsing markdown blockquotes, see https://github.com/adabru/adabru-markup/blob/v0.1.1/js/core.js#L11-L46 . The code is uncommented so it may not help you.
@adabru cool. I would try this and let you know how it goes.
The sample language:
The Grammar
The error:
Parse Error: failed to match 'Ws' at line=2, col=2, pos=34 (expected '["\t","\n"],"\r"," "')
Parser stopped parsing at this AST