patrickfrey / mewa

Compiler-compiler for writing compiler frontends with Lua
https://github.com/patrickfrey/mewa
MIT License
23 stars 2 forks source link

Grammar railroad diagram #5

Closed mingodad closed 2 years ago

mingodad commented 2 years ago

Would be nice if mewa could output an EBNF compatible with https://www.bottlecaps.de/rr/ui to generate railroad diagrams (https://en.wikipedia.org/wiki/Syntax_diagram) like was done here https://github.com/mingodad/lalr-parser-test for bison/byacc/lemon.

Meanwhile I did a crud search and replace with Lua string patterns and a bit of manuall fixes to get the EBNF shown bellow for the examples/language1/grammar.g.

Copy and paste the EBNF shown bellow on https://www.bottlecaps.de/rr/ui in the tab "Edit Grammar" then switch to the tab "View Diagram".

/*
BOOLEAN : '';
IDENT   : '[a-zA-Z_]+[a-zA-Z_0-9]*';
DQSTRING: '["]["]' 1;
SQSTRING: "['][']" 1;
UINTEGER: '[0123456789]+';
FLOAT   : '[0123456789]*[.][0123456789]+';
FLOAT   : '[0123456789]*[.][0123456789]+[Ee][+-]{0,1}[0123456789]+';
ILLEGAL : '[0123456789]+[A-Za-z_]';
ILLEGAL : '[0123456789]*[.][0123456789]+[A-Za-z_]';
ILLEGAL : '[0123456789]*[.][0123456789]+[Ee][+-]{0,1}[0123456789]+[A-Za-z_]';
*/

program ::= extern_definitionlist free_definitionlist main_procedure

extern_definitionlist ::= extern_definition extern_definitionlist
            | /*empty*/

free_definitionlist ::= free_definition free_definitionlist
            | /*empty*/

namespace_definitionlist ::= namespace_definition namespace_definitionlist
            | /*empty*/

instruct_definitionlist ::= instruct_definition instruct_definitionlist
            | /*empty*/

inclass_definitionlist ::= inclass_definition inclass_definitionlist
            | /*empty*/

ininterf_definitionlist ::= ininterf_definition ininterf_definitionlist
            | /*empty*/

extern_definition ::= "extern" DQSTRING "function" IDENT typespec "" ";"
            | "extern" DQSTRING "procedure" IDENT "" ";"
            | "extern" DQSTRING "function" IDENT typespec "" ";"
            | "extern" DQSTRING "procedure" IDENT "" ";"

extern_paramdecl ::= typespec IDENT
            | typespec

extern_parameters ::= extern_paramdecl "," extern_parameters
            | extern_paramdecl

extern_paramlist ::= extern_parameters
            | /*empty*/

ininterf_definition ::= "function" IDENT typespec "" funcattribute ";"
            | "procedure" IDENT "" funcattribute ";"
            | "operator" operatordecl typespec "" funcattribute ";"
            | "operator" operatordecl "" funcattribute ";"

funcattribute ::= "const" funcattribute
            | "nothrow" funcattribute
            | /*empty*/

instruct_definition ::= typedefinition ";"
            | variabledefinition ";"
            | structdefinition

inclass_definition ::= typedefinition ";"
            | variabledefinition ";"
            | structdefinition
            | classdefinition
            | interfacedefinition
            | functiondefinition
            | operatordefinition
            | constructordefinition

free_definition ::= namespacedefinition
            | typedefinition ";"
            | variabledefinition ";"
            | structdefinition
            | classdefinition
            | interfacedefinition
            | functiondefinition

namespace_definition ::= namespacedefinition
            | typedefinition ";"
            | structdefinition
            | classdefinition
            | interfacedefinition
            | functiondefinition

typename/*/L1*/ ::= IDENT
            | IDENT "::" typename

typehdr/*/L1*/ ::= typename
            | "const" typename
            | "any" "class" "^"
            | "any" "const" "class" "^"
            | "any" "struct" "^"
            | "any" "const" "struct" "^"

typegen/*/L1*/ ::= typehdr
            | typegen "[" generic_instance "]"
            | typegen "^"
            | typegen "const" "^"

typespec/*/L1*/ ::= typegen
            | typegen "&"

typedefinition ::= "typedef" typegen IDENT
            | "typedef" "function" IDENT typespec ""
            | "typedef" "procedure" IDENT ""
            | "typedef" "function" IDENT typespec "" "nothrow"
            | "typedef" "procedure" IDENT "" "nothrow"

structdefinition ::= "struct" IDENT "{" instruct_definitionlist "}"
            | "generic" "struct" IDENT "[" generic_header "]"
                "{" instruct_definitionlist "}"

interfacedefinition ::= "interface" IDENT "{" ininterf_definitionlist "}"

inheritlist ::= typegen "," inheritlist
            | typegen

namespacedefinition ::= "namespace" IDENT  "{" namespace_definitionlist "}"

classdefinition ::= "class" IDENT "{" inclass_definitionlist "}"
            | "class" IDENT ":" inheritlist "{" inclass_definitionlist "}"
            | "generic" "class" IDENT "[" generic_header "]"
                "{" inclass_definitionlist "}"
            | "generic" "class" IDENT "[" generic_header "]"
                ":" inheritlist "{" inclass_definitionlist "}"

linkage ::= "private"
            | "public"
            | /*empty*/

functiondefinition ::= linkage "function" IDENT typespec callablebody
            | linkage "procedure" IDENT callablebody
            | "generic" linkage "function" IDENT "[" generic_header "]"
                 typespec callablebody
            | "generic" linkage "procedure" IDENT "[" generic_header "]"
                 callablebody

constructordefinition ::= linkage "constructor" callablebody
            | "destructor" codeblock

operatordefinition ::= linkage "operator" operatordecl typespec callablebody
            | linkage "operator" operatordecl callablebody

operatordecl ::= "->"
            | "="
            | "+"
            | "-"
            | "*"
            | "/"
            | "%"
            | "&&"
            | "||"
            | "&"
            | "|"
            | "<<"
            | ">>"
            | "~"
            | "!"
            | ""
            | "[" "]"
            | "=="
            | "!="
            | ">="
            | "<="
            | ">"
            | "<"

lambda_paramlist ::= lambda_parameters
            | /*empty*/

lambda_parameters ::= IDENT "," lambda_parameters
            | IDENT

lamda_expression ::= "lambda" "" codeblock

generic_instance_defelem ::= typegen
            | UINTEGER
            | lamda_expression

generic_instance_deflist ::= generic_instance_defelem
            | generic_instance_defelem "," generic_instance_deflist

generic_instance ::= generic_instance_deflist

generic_defaultlist ::= IDENT "=" typegen "," generic_defaultlist
            | IDENT "=" typegen

generic_identlist ::= IDENT "," generic_identlist
            | IDENT "," generic_defaultlist
            | IDENT

generic_header ::= generic_identlist
            | generic_defaultlist

callablebody ::= "" funcattribute "{" statementlist "}"

main_procedure ::= "main" codeblock
            | /*empty*/

impl_paramlist ::= impl_parameters
            | /*empty*/

impl_parameters ::= impl_paramdecl "," impl_parameters
            | impl_paramdecl

impl_paramdecl ::= typespec IDENT

codeblock/*/L1*/ ::= "{" statementlist "}"

statementlist/*/L1*/ ::= statement statementlist
            | /*empty*/

elseblock/*/L1*/ ::= "elseif" "" codeblock  elseblock
            | "elseif" "" codeblock
            | "else" codeblock

catchblock ::= "catch" IDENT    codeblock
            | "catch" IDENT "," IDENT codeblock

tryblock ::= "try" codeblock

statement/*/L1*/ ::= structdefinition
            | classdefinition
            | functiondefinition
            | typedefinition ";"
            | "var" variabledefinition ";"
            | expression ";"
            | "return" expression ";"
            | "return" ";"
            | "throw" expression "," expression ";"
            | "throw" expression  ";"
            | tryblock catchblock
            | "delete" expression ";"
            | "if" "" codeblock elseblock
            | "if" "" codeblock
            | "while" "" codeblock
            | "with" "" codeblock
            | "with" "" ";"
            | codeblock

variabledefinition ::= typespec IDENT
            | typespec IDENT "=" expression

expression/*/L1*/ ::= "{" expressionlist "}"
            | "{" "}"
            | "new" typespec ":" expression
            | "cast" typespec ":" expression

expression/*/L2*/ ::= IDENT
            | BOOLEAN
            | UINTEGER
            | FLOAT
            | "null"
            | DQSTRING
            | SQSTRING
            | lamda_expression
            | ""

expression/*/L3*/ ::= expression  "="  expression
            | expression  "+="  expression
            | expression  "-="  expression
            | expression  "*="  expression
            | expression  "/="  expression
            | expression  "^="  expression
            | expression  "&="  expression
            | expression  "%="  expression
            | expression  "&&="  expression
            | expression  "||="  expression
            | expression  "&="  expression
            | expression  "|="  expression
            | expression  "<<="  expression
            | expression  ">>="  expression

expression/*/L4*/ ::= expression  "||"  expression

expression/*/L5*/ ::= expression  "&&"  expression

expression/*/L6*/ ::= expression  "|"  expression

expression/*/L7*/ ::= expression  "^"  expression
            | expression  "&"  expression

expression/*/L8*/ ::= expression  "=="  expression
            | expression  "!="  expression
            | expression  "<="  expression
            | expression  "<"  expression
            | expression  ">="  expression
            | expression  ">"  expression

expression/*/L9*/ ::= expression  "+"  expression
            | expression  "-"  expression
            | "&"  expression
            | "-"  expression
            | "+"  expression
            | "~"  expression
            | "!"  expression

expression/*/L10*/ ::= expression  "*"  expression
            | expression  "/"  expression
            | expression  "%"  expression

expression/*/L11*/ ::= expression  "<<"  expression
            | expression  ">>"  expression

expression/*/L12*/ ::= iexpression
            | expression "." IDENT
            | "*" expression

expression/*/L13*/ ::= expression  ""
            | expression  ""
            | expression  "[" expressionlist "]"

iexpression/*/L14*/ ::= expression indirection IDENT

indirection/*/L14*/ ::= "->" indirection
            | "->"

expressionlist/*/L0*/ ::= expression "," expressionlist
            | expression

Script that transformed the grammar:

auto txt = readfile(
    "/home/mingo/dev/lua/mewa/examples/language1/grammar.g"
    );

txt = txt.gsub("%%[^\n]+;", "");
txt = txt.gsub("%s*%b()", "");
txt = txt.gsub("ε", "/*empty*/");
txt = txt.gsub("/L%d+", "/*%1*/");
txt = txt.gsub("%s+= ", " ::= ");
txt = txt.gsub("(%s+);\n", "%1\n");
print(txt);
patrickfrey commented 2 years ago

Good point. Producing an EBNF that can be used by other tools for introspection, conflict detection, more interaction, etc. is definitely helpful. The script you provided is very helpful. It summarizes what has to be done. I'll have a look at it.

patrickfrey commented 2 years ago

I hope it is Ok to omit the mapping of lexemes. Mapping the lexemes would require considerable effort. I am not convinced of diagrams mixing lexemes with the grammar in the representation of programming languages. It makes sense for other types of grammar though. What I could do, is incorporate some description of lexemes extracted from a comment.

mingodad commented 2 years ago

Having the tokens/lexems make the railroad diagram easier to read/visualize but it's true that they require a bit of manual working, even an inmplete token list has a nice effect.

mingodad commented 2 years ago

Maybe I misunderstood your point, if you mean the commented code at the beginning you are right I left it there for myself it doesn't appear on the railroad diagram.

mingodad commented 2 years ago

Also the tool that creates the railroad diagram make several optimizations/simplifications that can help to update the grammar.

mingodad commented 2 years ago

I also have a resyntaxed Lua https://github.com/mingodad/ljs that can be used as a direct replacement and allow a more C/C++ like syntax and another one https://github.com/mingodad/squilu that would need a bit more work but have syntax a lot more close to C/C++ including 0 based indexes.

mingodad commented 2 years ago

Another language that has nice features is https://github.com/vovkos/jancy

patrickfrey commented 2 years ago

There is a new release out 0.10

The mewa program has a new option -l/--generate-language that creates a description of all elements in the .g file as Lua table and more. You can add comments with JavaDoc-like tags with a '@' prefix that are attached to the Lua table elements. This should provide the support you need to create any form of grammar description.

For example: // @rule LOALPHA ::= "a"|"b"|"c"|"d"|"e"|"f"|"g"|"h"|"i"|"j"|"k"|"l"|"m"|"n"|"o"|"p"|"q"|"r"|"s"|"t"|"u"|"v"|"w"|"x"|"y"|"z" // HIALPHA ::= "A"|"B"|"B"|"D"|"E"|"F"|"G"|"H"|"I"|"J"|"K"|"L"|"M"|"N"|"O"|"P"|"Q"|"R"|"S"|"T"|"U"|"V"|"W"|"X"|"Y"|"Z" // USCORE ::= "" // IDENT ::= (LOALPHA | HIALPHA | USCORE) (LOALPHA | HIALPHA | USCORE | DIGIT)* IDENT : '[a-zA-Z]+[a-zA-Z_0-9]*';

The element ident gets an attribute 'rule' attached.

{ ... RULES = { ... {op="TOKEN", name="IDENT", pattern="[a-zA-Z_]+[a-zA-Z0-9]*", rule= { 'LOALPHA ::= "a"|"b"|"c"|"d"|"e"|"f"|"g"|"h"|"i"|"j"|"k"|"l"|"m"|"n"|"o"|"p"|"q"|"r"|"s"|"t"|"u"|"v"|"w"|"x"|"y"|"z"', 'HIALPHA ::= "A"|"B"|"B"|"D"|"E"|"F"|"G"|"H"|"I"|"J"|"K"|"L"|"M"|"N"|"O"|"P"|"Q"|"R"|"S"|"T"|"U"|"V"|"W"|"X"|"Y"|"Z"', 'USCORE ::= ""', "IDENT ::= (LOALPHA | HIALPHA | USCORE) (LOALPHA | HIALPHA | USCORE | DIGIT)*"}, line=17}, ... }

A description will be provided. The names of the elements are aligned with the description in https://github.com/patrickfrey/mewa/blob/master/doc/grammar.md with the additional elements "TOKEN" and "PROD" (production). Each element has an attribute line that points to the line number in the source.

I wrote an example Lua script that recreates the original grammar file from the Lua table. See https://github.com/patrickfrey/mewa/blob/master/examples/printGrammarFromImage.lua

patrickfrey commented 2 years ago

Example ouput for the language1 https://github.com/patrickfrey/mewa/blob/master/tests/language1_grammar.lua.exp

patrickfrey commented 2 years ago

The tests work with a grammar recreated from this output with the example script.

mingodad commented 2 years ago

Using your https://github.com/patrickfrey/mewa/blob/master/examples/printGrammarFromImage.lua as starting point I changed it to output a grammar somehow compatible with yacc/bison for testing purposes and using byacc/bison to test it we get this output for language1:

>byacc-nb -v grammar.g.y
byacc-nb: 1 rule never reduced
byacc-nb: 1401 shift/reduce conflicts, 5 reduce/reduce conflicts.
1406 conflicts
103 terminal symbols
58 non-terminal symbols
161 total symbols
236 rules
484 states
>Exit code: 0
>bison-nb -v grammar.g.y
grammar.g.y: warning: 1366 shift/reduce conflicts [-Wconflicts-sr]
grammar.g.y: warning: 40 reduce/reduce conflicts [-Wconflicts-rr]
grammar.g.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
grammar.g.y:314.27-61: warning: rule useless in parser due to conflicts [-Wother]
  314 |             | expression "&=" expression %prec L3 //(>>as...
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>Exit code: 0

Resulting grammar (with few manual fixes):

//% LANGUAGE language1;
//% TYPESYSTEM "language1/typesystem";
//% CMDLINE "cmdlinearg";
//# @rule BOOLEAN ::= "true" | "false"
//BOOLEAN: "((true)|(false))";
%token  BOOLEAN
//# @rule LOALPHA ::= "a"|"b"|"c"|"d"|"e"|"f"|"g"|"h"|"i"|"j"|"k"|"l"|"m"|"n"|"o"|"p"|"q"|"r"|"s"|"t"|"u"|"v"|"w"|"x"|"y"|"z"
//# HIALPHA ::= "A"|"B"|"B"|"D"|"E"|"F"|"G"|"H"|"I"|"J"|"K"|"L"|"M"|"N"|"O"|"P"|"Q"|"R"|"S"|"T"|"U"|"V"|"W"|"X"|"Y"|"Z"
//# USCORE  ::= "_"
//# IDENT   ::= (LOALPHA | HIALPHA | USCORE) (LOALPHA | HIALPHA | USCORE | DIGIT)*
//IDENT: "[a-zA-Z_]+[a-zA-Z_0-9]*";
%token  IDENT
//# @rule DQSTRING ::= [double quoted string]
//# @description Double quoted string with backslash are used for escaping double quotes and back slashes in the string
//DQSTRING: '["]((([^\\"\n]+)|([\\][^"\n]))*)["]' 1;
%token  DQSTRING
//# @rule SQSTRING ::= [single quoted string]
//# @description Single quoted string with backslash are used for escaping single quotes and back slashes in the string
//SQSTRING: "[']((([^\\'\n]+)|([\\][^'\n]))*)[']" 1;
%token  SQSTRING
//# @rule DIGIT ::= ("0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9")
//# UINTEGER ::= DIGIT*
//UINTEGER: "[0123456789]+";
%token  UINTEGER
//# @rule FLOAT ::= DIGIT* "." DIGIT+
//FLOAT: "[0123456789]*[.][0123456789]+";
%token  FLOAT
//# @rule EXPONENT ::= ("E"|"e") (("-"|"+") DIGIT+ | DIGIT+)
//# FLOAT ::= DIGIT* "." DIGIT+ EXPONENT
//FLOAT: "[0123456789]*[.][0123456789]+[Ee][+-]{0,1}[0123456789]+";
//%token    FLOAT
//# @description Numbers must not be followed immediately by an identifier
//ILLEGAL: "[0123456789]+[A-Za-z_]";
%token  ILLEGAL
//ILLEGAL: "[0123456789]*[.][0123456789]+[A-Za-z_]";
//%token    ILLEGAL
//ILLEGAL: "[0123456789]*[.][0123456789]+[Ee][+-]{0,1}[0123456789]+[A-Za-z_]";
//%token    ILLEGAL
//# @rule COMMENT ::= "/*" ... "*/"
//# @description C style comments
//% COMMENT "/*" "*/";
//# @rule COMMENT ::= "//" ... "\\n"
//# @description C++ style end of line comments
//% COMMENT "//";
//# @startsymbol program
%left   L0
%left   L1
%left   L2
%left   L3
%left   L4
%left   L5
%left   L6
%left   L7
%left   L8
%left   L9
%left   L10
%left   L11
%left   L12
%left   L13
%left   L14

%%

program         : extern_definitionlist free_definitionlist main_procedure //(program)
            ;
extern_definitionlist   : extern_definition extern_definitionlist
            |
            ;
free_definitionlist : free_definition free_definitionlist
            |
            ;
namespace_definitionlist: namespace_definition namespace_definitionlist
            |
            ;
instruct_definitionlist : instruct_definition instruct_definitionlist
            |
            ;
inclass_definitionlist  : inclass_definition inclass_definitionlist
            |
            ;
ininterf_definitionlist : ininterf_definition ininterf_definitionlist
            |
            ;
extern_definition   : "extern" DQSTRING "function" IDENT typespec "(" extern_paramlist ")" ";" //(extern_funcdef)
            | "extern" DQSTRING "procedure" IDENT "(" extern_paramlist ")" ";" //(extern_procdef)
            | "extern" DQSTRING "function" IDENT typespec "(" extern_paramlist "..." ")" ";" //(extern_funcdef_vararg)
            | "extern" DQSTRING "procedure" IDENT "(" extern_paramlist "..." ")" ";" //(extern_procdef_vararg)
            ;
extern_paramdecl    : typespec IDENT //(extern_paramdef)
            | typespec //(extern_paramdef)
            ;
extern_parameters   : extern_paramdecl "," extern_parameters //(extern_paramdef_collect)
            | extern_paramdecl //(extern_paramdef_collect)
            ;
extern_paramlist    : extern_parameters //(extern_paramdeflist)
            |  //(extern_paramdeflist)
            ;
ininterf_definition : "function" IDENT typespec "(" extern_paramlist ")" funcattribute ";" //(interface_funcdef)
            | "procedure" IDENT "(" extern_paramlist ")" funcattribute ";" //(interface_procdef)
            | "operator" operatordecl typespec "(" extern_paramlist ")" funcattribute ";" //(interface_operator_funcdef)
            | "operator" operatordecl "(" extern_paramlist ")" funcattribute ";" //(interface_operator_procdef)
            ;
funcattribute       : "const" funcattribute //(funcattribute {const=true})
            | "nothrow" funcattribute //(funcattribute {throws=false})
            |  //(funcattribute {const=false,throws=true})
            ;
instruct_definition : typedefinition ";" //(definition 1)
            | variabledefinition ";" //(definition 2)
            | structdefinition //(definition 1)
            ;
inclass_definition  : typedefinition ";" //(definition 1)
            | variabledefinition ";" //(definition 2)
            | structdefinition //(definition 1)
            | classdefinition //(definition 1)
            | interfacedefinition //(definition 1)
            | functiondefinition //(definition_2pass 4)
            | operatordefinition //(definition_2pass 4)
            | constructordefinition //(definition_2pass 4)
            ;
free_definition     : namespacedefinition
            | typedefinition ";" //(definition 1)
            | variabledefinition ";" //(definition 1)
            | structdefinition //(definition 1)
            | classdefinition //(definition 1)
            | interfacedefinition //(definition 1)
            | functiondefinition //(definition 1)
            ;
namespace_definition    : namespacedefinition //(definition 1)
            | typedefinition ";" //(definition 1)
            | structdefinition //(definition 1)
            | classdefinition //(definition 1)
            | interfacedefinition //(definition 1)
            | functiondefinition //(definition 1)
            ;
typename/*L1*/      : IDENT %prec L1
            | IDENT "::" typename %prec L1
            ;
typehdr/*L1*/       : typename %prec L1 //(typehdr {const=false})
            | "const" typename %prec L1 //(typehdr {const=true})
            | "any" "class" "^" %prec L1 //(typehdr_any "any class^")
            | "any" "const" "class" "^" %prec L1 //(typehdr_any "any const class^")
            | "any" "struct" "^" %prec L1 //(typehdr_any "any struct^")
            | "any" "const" "struct" "^" %prec L1 //(typehdr_any "any const struct^")
            ;
typegen/*L1*/       : typehdr %prec L1
            | typegen "[" generic_instance "]" %prec L1 //(typegen_generic)
            | typegen "^" %prec L1 //(typegen_pointer {const=false})
            | typegen "const" "^" %prec L1 //(typegen_pointer {const=true})
            ;
typespec/*L1*/      : typegen %prec L1 //(typespec)
            | typegen "&" %prec L1 //(typespec_ref)
            ;
typedefinition      : "typedef" typegen IDENT //(typedef)
            | "typedef" "function" IDENT typespec "(" extern_paramlist ")" //(typedef_functype {throws=true})
            | "typedef" "procedure" IDENT "(" extern_paramlist ")" //(typedef_proctype {throws=true})
            | "typedef" "function" IDENT typespec "(" extern_paramlist ")" "nothrow" //(typedef_functype {throws=false})
            | "typedef" "procedure" IDENT "(" extern_paramlist ")" "nothrow" //(typedef_proctype {throws=false})
            ;
structdefinition    : "struct" IDENT "{" instruct_definitionlist "}" //(structdef)
            | "generic" "struct" IDENT "[" generic_header "]" "{" instruct_definitionlist "}" //(generic_structdef)
            ;
interfacedefinition : "interface" IDENT "{" ininterf_definitionlist "}" //(interfacedef)
            ;
inheritlist     : typegen "," inheritlist //(inheritdef 1)
            | typegen //(inheritdef 1)
            ;
namespacedefinition : "namespace" IDENT "{" namespace_definitionlist "}" //(namespacedef)
            ;
classdefinition     : "class" IDENT "{" inclass_definitionlist "}" //(classdef)
            | "class" IDENT ":" inheritlist "{" inclass_definitionlist "}" //(classdef)
            | "generic" "class" IDENT "[" generic_header "]" "{" inclass_definitionlist "}" //(generic_classdef)
            | "generic" "class" IDENT "[" generic_header "]" ":" inheritlist "{" inclass_definitionlist "}" //(generic_classdef)
            ;
linkage         : "private" //(linkage {private=true,linkage="internal",explicit=true})
            | "public" //(linkage {private=false,linkage="external",explicit=true})
            |  //(linkage {private=false,linkage="external",explicit=false})
            ;
functiondefinition  : linkage "function" IDENT typespec callablebody //(funcdef)
            | linkage "procedure" IDENT callablebody //(procdef)
            | "generic" linkage "function" IDENT "[" generic_header "]" typespec callablebody //(generic_funcdef)
            | "generic" linkage "procedure" IDENT "[" generic_header "]" callablebody //(generic_procdef)
            ;
constructordefinition   : linkage "constructor" callablebody //(constructordef)
            | "destructor" codeblock //(destructordef {linkage="external"})
            ;
operatordefinition  : linkage "operator" operatordecl typespec callablebody //(operator_funcdef)
            | linkage "operator" operatordecl callablebody //(operator_procdef)
            ;
operatordecl        : "->" //(operatordecl {name="->",symbol="arrow"})
            | "=" //(operatordecl {name="=",symbol="assign"})
            | "+" //(operatordecl {name="+",symbol="plus"})
            | "-" //(operatordecl {name="-",symbol="minus"})
            | "*" //(operatordecl {name="*",symbol="mul"})
            | "/" //(operatordecl {name="/",symbol="div"})
            | "%" //(operatordecl {name="%",symbol="mod"})
            | "&&" //(operatordecl {name="&&",symbol="and"})
            | "||" //(operatordecl {name="||",symbol="or"})
            | "&" //(operatordecl {name="&",symbol="bitand"})
            | "|" //(operatordecl {name="|",symbol="bitor"})
            | "<<" //(operatordecl {name="<<",symbol="lsh"})
            | ">>" //(operatordecl {name=">>",symbol="rsh"})
            | "~" //(operatordecl {name="~",symbol="lneg"})
            | "!" //(operatordecl {name="!",symbol="not"})
            | "(" ")" //(operatordecl {name="()",symbol="call"})
            | "[" "]" //(operatordecl {name="[]",symbol="get"})
            | "==" //(operatordecl {name="==",symbol="eq"})
            | "!=" //(operatordecl {name="!=",symbol="ne"})
            | ">=" //(operatordecl {name=">=",symbol="ge"})
            | "<=" //(operatordecl {name="<=",symbol="le"})
            | ">" //(operatordecl {name=">",symbol="gt"})
            | "<" //(operatordecl {name="<",symbol="lt"})
            ;
lambda_paramlist    : lambda_parameters //(lambda_paramdeflist)
            |  //(lambda_paramdeflist)
            ;
lambda_parameters   : IDENT "," lambda_parameters
            | IDENT
            ;
lamda_expression    : "lambda" "(" lambda_paramlist ")" codeblock //(lambda_expression)
            ;
generic_instance_defelem: typegen
            | UINTEGER //(generic_instance_dimension)
            | lamda_expression
            ;
generic_instance_deflist: generic_instance_defelem //(generic_instance_deflist)
            | generic_instance_defelem "," generic_instance_deflist //(generic_instance_deflist)
            ;
generic_instance    : generic_instance_deflist //(generic_instance)
            ;
generic_defaultlist : IDENT "=" typegen "," generic_defaultlist //(generic_header_ident_type)
            | IDENT "=" typegen //(generic_header_ident_type)
            ;
generic_identlist   : IDENT "," generic_identlist //(generic_header_ident)
            | IDENT "," generic_defaultlist //(generic_header_ident)
            | IDENT //(generic_header_ident)
            ;
generic_header      : generic_identlist //(generic_header)
            | generic_defaultlist //(generic_header)
            ;
callablebody        : "(" impl_paramlist ")" funcattribute "{" statementlist "}" //({}callablebody)
            ;
main_procedure      : "main" codeblock //(main_procdef)
            |
            ;
impl_paramlist      : impl_parameters //(paramdeflist)
            |  //(paramdeflist)
            ;
impl_parameters     : impl_paramdecl "," impl_parameters
            | impl_paramdecl
            ;
impl_paramdecl      : typespec IDENT //(paramdef)
            ;
codeblock/*L1*/     : "{" statementlist "}" %prec L1 //({}codeblock)
            ;
statementlist/*L1*/ : statement statementlist %prec L1 //(>>)
            |  %prec L1
            ;
elseblock/*L1*/     : "elseif" "(" expression ")" codeblock elseblock %prec L1 //(conditional_elseif)
            | "elseif" "(" expression ")" codeblock %prec L1 //(conditional_elseif)
            | "else" codeblock %prec L1 //(conditional_else)
            ;
catchblock      : "catch" IDENT codeblock //(>>catchblock)
            | "catch" IDENT "," IDENT codeblock //(>>catchblock)
            ;
tryblock        : "try" codeblock //(tryblock)
            ;
statement/*L1*/     : structdefinition %prec L1 //(definition 1)
            | classdefinition %prec L1 //(definition 1)
            | functiondefinition %prec L1 //(definition 1)
            | typedefinition ";" %prec L1 //(definition 1)
            | "var" variabledefinition ";" %prec L1 //(>>definition 1)
            | expression ";" %prec L1 //(free_expression)
            | "return" expression ";" %prec L1 //(>>return_value)
            | "return" ";" %prec L1 //(>>return_void)
            | "throw" expression "," expression ";" %prec L1 //(throw_exception)
            | "throw" expression ";" %prec L1 //(throw_exception)
            | tryblock catchblock %prec L1 //({}trycatch)
            | "delete" expression ";" %prec L1 //(delete)
            | "if" "(" expression ")" codeblock elseblock %prec L1 //(conditional_if)
            | "if" "(" expression ")" codeblock %prec L1 //(conditional_if)
            | "while" "(" expression ")" codeblock %prec L1 //(conditional_while)
            | "with" "(" expression ")" codeblock %prec L1 //(with_do)
            | "with" "(" expression ")" ";" %prec L1 //(with_do)
            | codeblock %prec L1
            ;
variabledefinition  : typespec IDENT //(>>vardef)
            | typespec IDENT "=" expression //(>>vardef)
            ;
expression/*L1*/    : "{" expressionlist "}" %prec L1 //(>>structure)
            | "{" "}" %prec L1 //(>>structure)
            | "new" typespec ":" expression %prec L1 //(>>allocate)
            | "cast" typespec ":" expression %prec L1 //(>>typecast)
            ;
expression/*L2*/    : IDENT %prec L2 //(variable)
            | BOOLEAN %prec L2 //(constant "constexpr bool")
            | UINTEGER %prec L2 //(constant "constexpr uint")
            | FLOAT %prec L2 //(constant "constexpr float")
            | "null" %prec L2 //(null)
            | DQSTRING %prec L2 //(string_constant)
            | SQSTRING %prec L2 //(char_constant)
            | lamda_expression %prec L2
            | "(" expression ")" %prec L2
            ;
expression/*L3*/    : expression "=" expression %prec L3 //(>>binop "=")
            | expression "+=" expression %prec L3 //(>>assign_operator "+")
            | expression "-=" expression %prec L3 //(>>assign_operator "-")
            | expression "*=" expression %prec L3 //(>>assign_operator "*")
            | expression "/=" expression %prec L3 //(>>assign_operator "/")
            | expression "^=" expression %prec L3 //(>>assign_operator "^")
            | expression "&=" expression %prec L3 //(>>assign_operator "&")
            | expression "%=" expression %prec L3 //(>>assign_operator "%")
            | expression "&&=" expression %prec L3 //(>>assign_operator "&&")
            | expression "||=" expression %prec L3 //(>>assign_operator "||")
            | expression "&=" expression %prec L3 //(>>assign_operator "&")
            | expression "|=" expression %prec L3 //(>>assign_operator "|")
            | expression "<<=" expression %prec L3 //(>>assign_operator "<<")
            | expression ">>=" expression %prec L3 //(>>assign_operator ">>")
            ;
expression/*L4*/    : expression "||" expression %prec L4 //(>>binop "||")
            ;
expression/*L5*/    : expression "&&" expression %prec L5 //(>>binop "&&")
            ;
expression/*L6*/    : expression "|" expression %prec L6 //(>>binop "|")
            ;
expression/*L7*/    : expression "^" expression %prec L7 //(>>binop "^")
            | expression "&" expression %prec L7 //(>>binop "&")
            ;
expression/*L8*/    : expression "==" expression %prec L8 //(>>binop "==")
            | expression "!=" expression %prec L8 //(>>binop "!=")
            | expression "<=" expression %prec L8 //(>>binop "<=")
            | expression "<" expression %prec L8 //(>>binop "<")
            | expression ">=" expression %prec L8 //(>>binop ">=")
            | expression ">" expression %prec L8 //(>>binop ">")
            ;
expression/*L9*/    : expression "+" expression %prec L9 //(>>binop "+")
            | expression "-" expression %prec L9 //(>>binop "-")
            | "&" expression %prec L9 //(operator_address "&")
            | "-" expression %prec L9 //(>>unop "-")
            | "+" expression %prec L9 //(>>unop "+")
            | "~" expression %prec L9 //(>>unop "~")
            | "!" expression %prec L9 //(>>unop "!")
            ;
expression/*L10*/   : expression "*" expression %prec L10 //(>>binop "*")
            | expression "/" expression %prec L10 //(>>binop "/")
            | expression "%" expression %prec L10 //(>>binop "%")
            ;
expression/*L11*/   : expression "<<" expression %prec L11 //(>>binop "<<")
            | expression ">>" expression %prec L11 //(>>binop ">>")
            ;
expression/*L12*/   : iexpression %prec L12
            | expression "." IDENT %prec L12 //(member)
            | "*" expression %prec L12 //(>>unop "->")
            ;
expression/*L13*/   : expression "(" expressionlist ")" %prec L13 //(>>operator "()")
            | expression "(" ")" %prec L13 //(>>operator "()")
            | expression "[" expressionlist "]" %prec L13 //(>>operator_array "[]")
            ;
iexpression/*L14*/  : expression indirection IDENT %prec L14 //(rep_operator "->")
            ;
indirection/*L14*/  : "->" indirection %prec L14 //(count)
            | "->" %prec L14 //(count)
            ;
expressionlist/*L0*/    : expression "," expressionlist %prec L0
            | expression %prec L0
            ;

Generator for yacc grammar:

-- This module exports the function "printLanguageDef" that builds a mewa language description file (equivalent to the original ".g" file)
-- from a Lua table generated with the mewa option "--generate-language" or "-l"
--
-- This reverse process is in itself not very useful but for testing and as an example.
--
require "io"
require "string"
require "math"

local reserved = {"op","name","pattern","open","close","tabsize","nl","select","priority","left","right","scope","call","line"}
local image = {}

local function contains( tb, val)
   for i=1,#tb do
      if tb[i] == val then
     return true
      end
   end
   return false
end

local function printDecoratorsAsComments( rule)
    for key, val in pairs(rule) do
        if not contains( reserved, key) and #val > 0 then
            print( "//# @" .. key .. " " .. val[ 1])
            for ii=2,#val do
                print( "//#\t" .. val[ ii])
            end
        end
    end
end

local function quoteString( str)
    if string.find( str, "\"") then
        return "\'" .. str .. "\'"
    else
        return "\"" .. str .. "\""
    end
end

local function productionElementListToString( right)
    local rt = nil
    for idx,elem in ipairs(right) do
        if elem.type == "name" then
            value = elem.value
        elseif elem.type == "symbol" then
            value = quoteString(elem.value)
        else
            error( "unknown production element type '" .. elem.type .. "'")
        end
        rt = not rt and value or rt .. " " .. value
    end
    return rt
end

function image.printLanguageDef( def)
    print( "//% LANGUAGE " .. def.LANGUAGE .. ";")
    print( "//% TYPESYSTEM \"" .. def.TYPESYSTEM .. "\";")
    print( "//% CMDLINE \"" .. def.CMDLINE .. "\";")
    prev_prodname = nil
    rulestr = nil
    for idx,rule in ipairs( def.RULES ) do
        printDecoratorsAsComments( rule)
        if rule.op == "COMMENT" then
            if rule.close then
                print( "//% COMMENT " .. quoteString( rule.open) .. " " .. quoteString( rule.close) .. ";")
            else
                print( "//% COMMENT " .. quoteString( rule.open) .. ";")
            end
        elseif rule.op == "INDENTL" then
            print( "//% INDENTL " .. quoteString( rule.open) .. " " .. quoteString( rule.close) .. " " .. quoteString( rule.nl) .. " " .. quoteString( rule.tabsize) .. ";")
        elseif rule.op == "BAD" then
            print( "//% BAD " .. quoteString( rule.name) .. ";")
        elseif rule.op == "IGNORE" then
            print( "//% IGNORE " .. quoteString( rule.pattern) .. ";")
        elseif rule.op == "TOKEN" then
            if rule.select then
                print( "//" .. rule.name .. ": " .. quoteString( rule.pattern) .. " " .. rule.select .. ";")
            else
                print(  "//" .. rule.name .. ": " .. quoteString( rule.pattern) .. ";")
            end
            print( "%token", rule.name )
        elseif rule.op == "PROD" then
            if not prev_prodname then
                local priority_list = {}
                for idx2,rule2 in ipairs( def.RULES ) do
                    if rule2.op == "PROD" then
                        if rule2.priority then
                            priority_list[rule2.priority] = true
                        end
                    end
                end
                local priority_list2 = {}
                for k,v in pairs(priority_list) do
                    table.insert(priority_list2, k)
                end
                --table.sort(priority_list2, function(a, b) return a > b end)
                table.sort(priority_list2)
                for idx, k in ipairs(priority_list2) do
                    local lrn = k:sub(1,1)
                    if lrn == 'L' then
                        print("%left", k)
                    elseif lrn == 'R' then
                        print("%right", k)
                    elseif lrn == 'N' then
                        print("%nonassoc", k)
                    end
                end
                print("\n%%\n")
            end
            left = rule.priority and rule.left .. "/*" .. rule.priority .. "*/" or rule.left
            if prev_prodname == left then
                indent = string.rep( "\t", 3)
                rulestr = rulestr .. "\n" .. indent .. "| "
            else
                indent = string.rep( "\t", math.max( 3 - math.floor(string.len( left) / 8), 0))
                if rulestr then
                    print( rulestr .. "\n\t\t\t;")
                end
                rulestr = left .. indent .. ": "
                prev_prodname = left
            end
            if #rule.right > 0 then
                rulestr = rulestr .. productionElementListToString( rule.right)
            end
            if rule.priority then
                rulestr = rulestr .. " %prec " .. rule.priority
            end
            if rule.call or rule.scope then
                rulestr = rulestr .. " //(" .. (rule.scope or "") .. (rule.call or "") .. ")"
            end
        end
    end
    if rulestr then
        print( rulestr .. "\n\t\t\t;")
    end
end

return image
patrickfrey commented 2 years ago

The rule with "&=" appears as duplicate. I removed the duplicate from the language1 example and adapted the tests. I opened a new issue #6 for the missing error report from mewa.

patrickfrey commented 2 years ago

Documentation available in https://github.com/patrickfrey/mewa/blob/master/doc/bridges.md

mingodad commented 2 years ago

Thanks ! It's expected to have that many conflicts on language1 grammar ?

Now the output from bison/byacc is:

>byacc-nb -v grammar.g.y
byacc-nb: 1366 shift/reduce conflicts.
1366 conflicts
103 terminal symbols
58 non-terminal symbols
161 total symbols
235 rules
484 states
>Exit code: 0
>bison-nb -v grammar.g.y
grammar.g.y: warning: 1366 shift/reduce conflicts [-Wconflicts-sr]
grammar.g.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
>Exit code: 0

And trying to test it with lemon I ended up discovering a small bug there (see https://sqlite.org/forum/forumpost/b6edc69548 ):

lemon-nb  -s y.yl
Can't open the template file "/snap/bin/lempar.c".
1366 parsing conflicts.
Parser statistics:
  terminal symbols...................   102
  non-terminal symbols...............    57
  total symbols......................   159
  rules..............................   232
  states.............................   323
  conflicts..........................  1366
  conflicts S/R......................  1366
  conflicts R/R......................     0
  action table entries...............     0
  lookahead table entries............     0
  total table size (bytes)...........     0
>Exit code: 1
patrickfrey commented 2 years ago

S/R Conflicts are not reported by mewa on productions with different priorities. A production with L prefers the REDUCE over SHIFT, while R prefers SHIFT over REDUCE if the priority number is the same. If they are different, then the higher priority overwrites the conflicting table entry. The comparison cannot be put away without investigating the issue. It's worth having a look and maybe finding a way to use lemon or yacc or similar analysis tool for verification. I opened another issue #7 for that. Thanks for the feedback.

patrickfrey commented 2 years ago

I added the reporting of the number of S/R conflicts solved by priority to the debug output. See the issue #7. Got the same number: 1366

patrickfrey commented 2 years ago

May I close this issue now?

It is now possible to create any language description from the original language description with a Lua script using the output of mewa with the "--generate-language" option.

mingodad commented 2 years ago

Sure, thank you for all your help !

mingodad commented 2 years ago

I'm leaving here another grammar without conflicts I dirty converted to test mewa, also maybe would be a good idea to invert the rule separator, I mean use : for grammar rules and = for lexer rules this way adapting an exiting yacc grammar would be a bit easier.

// From https=//github.com/pedro-stanaka/c-alike-parser.git
% LANGUAGE calike;
% COMMENT "/*" "*/";
% COMMENT "//";

ADD_ASSIGN : '\+=';
ASSIGN : '=';
BITWISE_AND : '&';
BITWISE_NOT : '~';
BITWISE_OR : '|';
BITWISE_XOR : '\^';
CHAR : 'char';
CHARACTER : "[']((([^\\'\n]+)|([\\][^'\n])))[']";
COLON : ':';
COMMA : ',';
DEC : '--';
DEFINE : 'define';
DO : 'do';
ELSE : 'else';
EQUAL : '=';
EXIT : 'exit';
FOR : 'for';
GREATER_EQUAL : '>=';
GREATER_THAN : '>';
IDENTIFIER  : '[a-zA-Z_]+[a-zA-Z_0-9]*';
IF : 'if';
INC : '\+\+';
INT : 'int';
L_CURLY_BRACKET : '\{';
LESS_EQUAL : '<=';
LESS_THAN : '<';
LOGICAL_AND : '&&';
LOGICAL_OR : '||';
L_PAREN : '\(';
L_SHIFT : '>>';
MINUS : '-';
MINUS_ASSIGN : '-=';
MULTIPLY : '\*';
NOT : '!';
NOT_EQUAL : '!=';
NUM_INTEGER : '[0-9]+';
NUM_HEXA : '0[Xx][0-9A-Fa-f]+';
NUM_OCTAL : '0[1-7][0-7]*';
NUMBER_SIGN : '#';
PLUS : '\+';
PRINTF : 'printf';
R_BRACE_BRACKET : '\}';
REMAINDER : '%';
RETURN : 'return';
R_PAREN : '\)';
R_SHIFT : '<<';
SCANF : 'scanf';
SEMICOLON : ';';
STRING :  '["]((([^\\"\n]+)|([\\][^"\n]))*)["]';
TERNARY_CONDITIONAL : '\?';
VOID : 'void';
WHILE : 'while';

# @startsymbol first

first =
     program
    ;
program =
     declaration
    | function
    | declaration program
    | function program
    ;
declaration =
     NUMBER_SIGN DEFINE IDENTIFIER expression
    | variable_declaration
    | prototype_declaration
    ;
function =
     type IDENTIFIER params L_CURLY_BRACKET commands R_BRACE_BRACKET
    | type IDENTIFIER params L_CURLY_BRACKET function_pre commands R_BRACE_BRACKET
    ;
function_pre =
     variable_declaration
    | variable_declaration function_pre
    ;
variable_declaration =
     type variable_declaration_pre
    ;
variable_declaration_pre =
     IDENTIFIER variable_declaration_post
    | IDENTIFIER ASSIGN expression variable_declaration_post
    ;
variable_declaration_post =
     COMMA variable_declaration_pre
    | SEMICOLON
    ;
prototype_declaration =
     type IDENTIFIER params SEMICOLON
    ;
params =
     L_PAREN R_PAREN
    | L_PAREN params_post R_PAREN
    ;
params_post =
     type IDENTIFIER
    | type IDENTIFIER COMMA params_post
    ;
type =
     INT
    | CHAR
    | VOID
    ;
commands =
     command_list
    | command_list commands
    ;
block =
     L_CURLY_BRACKET commands R_BRACE_BRACKET
    ;
command_list =
     DO block WHILE L_PAREN expression R_PAREN SEMICOLON
    | IF L_PAREN expression R_PAREN block ELSE else_pre
    | IF L_PAREN expression R_PAREN block
    | WHILE L_PAREN expression R_PAREN block
    | FOR L_PAREN for_post for_post for_pre block
    | PRINTF L_PAREN STRING printf_pre R_PAREN SEMICOLON
    | SCANF L_PAREN STRING COMMA BITWISE_AND IDENTIFIER R_PAREN SEMICOLON
    | EXIT L_PAREN expression R_PAREN SEMICOLON
    | RETURN SEMICOLON
    | RETURN L_PAREN expression R_PAREN SEMICOLON
    | expression SEMICOLON
    | SEMICOLON
    | block
    ;
for_post =
     expression SEMICOLON
    | SEMICOLON
    ;
for_pre =
     expression R_PAREN
    | R_PAREN
    ;
else_pre =
     block
    | L_CURLY_BRACKET R_BRACE_BRACKET
    ;
printf_pre =
     COMMA expression
    | COMMA expression printf_pre
    ;
expression =
     conditional_expression
    | conditional_expression expression_post
    ;
expression_post =
     ASSIGN expression
    | ADD_ASSIGN expression
    | MINUS_ASSIGN expression
    ;
conditional_expression =
     logical_or_exp
    | TERNARY_CONDITIONAL logical_or_exp COLON logical_or_exp
    ;
logical_or_exp =
     logical_and_exp
    | logical_and_exp logical_or_exp_pre
    ;
logical_or_exp_pre =
     LOGICAL_OR logical_or_exp
    ;
logical_and_exp =
     or_expression
    | or_expression logical_and_exp_pre
    ;
logical_and_exp_pre =
     LOGICAL_AND logical_and_exp
    ;
or_expression =
     xor_expression
    | xor_expression or_expression_pre
    ;
or_expression_pre =
     BITWISE_OR or_expression
    ;
xor_expression =
     and_expression
    | and_expression xor_expression_pre
    ;
xor_expression_pre =
     BITWISE_XOR xor_expression
    ;
and_expression =
     equality_expression
    | equality_expression and_expression_pre
    ;
and_expression_pre =
     BITWISE_AND and_expression
    ;
equality_expression =
     relational_expression
    | relational_expression equality_expression_pre
    ;
equality_expression_pre =
     EQUAL equality_expression
    | NOT_EQUAL equality_expression
    ;
relational_expression =
     shift_expression
    | shift_expression relational_expression_pre
    ;
relational_expression_pre =
     LESS_THAN relational_expression
    | LESS_EQUAL relational_expression
    | GREATER_THAN relational_expression
    | GREATER_EQUAL relational_expression
    ;
shift_expression =
     additive_expression
    | additive_expression shift_expression_pre
    ;
shift_expression_pre =
     R_SHIFT shift_expression
    | L_SHIFT shift_expression
    ;
additive_expression =
     multiplicative_expression
    | multiplicative_expression additive_expression_pre
    ;
additive_expression_pre =
     PLUS additive_expression
    | MINUS additive_expression
    ;
multiplicative_expression =
     unary_expression
    | unary_expression multiplicative_expression_pre
    ;
multiplicative_expression_pre =
     REMAINDER multiplicative_expression
    | MULTIPLY multiplicative_expression
    ;
unary_expression =
     IDENTIFIER INC
    | IDENTIFIER
    | IDENTIFIER DEC
    | number
    | CHARACTER
    | IDENTIFIER L_PAREN unary_expression_pre R_PAREN
    | L_PAREN expression R_PAREN
    | NOT unary_expression
    | BITWISE_NOT unary_expression
    | MINUS unary_expression
    | PLUS unary_expression
    ;
unary_expression_pre =
     expression
    | expression COMMA unary_expression_pre
    ;
number =
     NUM_INTEGER
    | NUM_HEXA
    | NUM_OCTAL
    ;
mingodad commented 2 years ago

Here is another grammar (from https://github.com/ernestchu/java-compiler-front-end.git) that's more difficult because it has several precedence settings, byacc/bison reports no conflicts, but doing the conversion to mewa now I'm getting this error message with few info about the point in the grammar the error started ERR #552 "Unresolved identifier in the grammar definition" at line 780: error (noticed this grammar is to test mewa parser generator conflict resolution, no interest in creating the language).

Could you try to finish the conversion ?

Notice the grammar is annotated with the precedence using byacc -n grammar.y from https://github.com/mingodad/lalr-parser-test , if you come with a procedure to convert yacc grammars to mewa I can try to encode it on byacc/bison/lemon then you'll have several sources of grammar to test/use with mewa.

// From= https://github.com/ernestchu/java-compiler-front-end.git
% LANGUAGE java;
% COMMENT "/*" "*/";
% COMMENT "//";

ABSTRACT : 'abstract';
BOOLEAN : 'boolean';
BREAK : 'break';
BYTE : 'byte';
CASE : 'case';
CATCH : 'catch';
CHAR : 'char';
CLASS : 'class';
CONTINUE : 'continue';
DEFAULT : 'default';
DO : 'do';
DOUBLE : 'double';
ELSE : 'else';
EXTENDS : 'extends';
FINAL : 'final';
FINALLY : 'finally';
FLOAT : 'float';
FOR : 'for';
IF : 'if';
IMPLEMENTS : 'implements';
IMPORT : 'import';
INSTANCEOF : 'instanceof';
INT : 'int';
INTERFACE : 'interface';
LONG : 'long';
NATIVE : 'native';
NEW : 'new';
PACKAGE : 'package';
PRIVATE : 'private';
PROTECTED : 'protected';
PUBLIC : 'public';
RETURN : 'return';
SHORT : 'short';
STATIC : 'static';
SUPER : 'super';
SWITCH : 'switch';
SYNCHRONIZED : 'synchronized';
THIS : 'this';
THROW : 'throw';
THROWS : 'throws';
TRANSIENT : 'transient';
TRY : 'try';
VOID : 'void';
VOLATILE : 'volatile';
WHILE : 'while';
ASS : '=';
MUL_ASS : '\*=';
DIV_ASS : '/=';
MOD_ASS : '%=';
ADD_ASS : '\+=';
SUB_ASS : '-=';
LS_ASS : '<<=';
RS_ASS : '>>=';
URS_ASS : '>>>=';
EMP_ASS : '&=';
XOR_ASS : '\^⁼';
OR_ASS : '|=';
LS : '<<';
RS : '>>';
URS : '>>>';
EQ : '=';
NE : '!=';
LE : '<=';
GE : '>=';
LT : '<';
GT : '>';
AND : '&&';
OR : '||';
NOT : '!';
INC : '\+\+';
DEC : '--';
BOOL_LIT :  '((true)|(false))';
NULL_LIT : 'null';
CHAR_LIT : "[']((([^\\'\n]+)|([\\][^'\n])))[']";
STR_LIT : '["]((([^\\"\n]+)|([\\][^"\n]))*)["]';
INT_LIT : '[0-9]+';
FLT_LIT : '[0-9]*[.][0-9]+';
ID :  '[a-zA-Z_]+[a-zA-Z_0-9]*';
//CAST :
//PRE
//UMINUS
//POST

////%right /*1*/ ASS MUL_ASS DIV_ASS MOD_ASS ADD_ASS SUB_ASS LS_ASS RS_ASS URS_ASS EMP_ASS XOR_ASS OR_ASS
////%right /*2*/ '?' ':'
////%left /*3*/ OR
////%left /*4*/ AND
////%left /*5*/ '|'
////%left /*6*/ '^'
////%left /*7*/ '&'
////%left /*8*/ EQ NE
////%nonassoc /*9*/ INSTANCEOF LE GE LT GT
////%left /*10*/ LS RS URS
////%left /*11*/ '+' '-'
////%left /*12*/ '*' '%' '/'
////%right /*13*/ NEW CAST
////%right /*14*/ NOT PRE UMINUS '~'
////%nonassoc /*15*/ POST
////%left /*16*/ '[' ']' '.' '(' ')'

# @startsymbol Goal

Goal =
     CompilationUnit
    ;
Literal =
     BooleanLiteral
    | NullLiteral
    | CharacterLiteral
    | StringLiteral
    | IntegerLiteral
    | FloatingPointLiteral
    ;
BooleanLiteral =
     BOOL_LIT
    ;
NullLiteral =
     NULL_LIT
    ;
CharacterLiteral =
     CHAR_LIT
    ;
StringLiteral =
     STR_LIT
    ;
IntegerLiteral =
     INT_LIT
    ;
FloatingPointLiteral =
     FLT_LIT
    ;
Type =
     PrimitiveType
    | ReferenceType
    ;
PrimitiveType =
     NumericType
    | BOOLEAN
    ;
NumericType =
     IntegralType
    | FloatingPointType
    ;
IntegralType =
     BYTE
    | SHORT
    | INT
    | LONG
    | CHAR
    ;
FloatingPointType =
     FLOAT
    | DOUBLE
    ;
ReferenceType =
     ClassOrInterfaceType
    | ArrayType
    ;
ClassOrInterfaceType =
     Name
    ;
ClassType =
     ClassOrInterfaceType
    ;
InterfaceType =
     ClassOrInterfaceType
    ;
ArrayType =
     PrimitiveType '[' /*16L*/ ']' /*16L*/
    | Name '[' /*16L*/ ']' /*16L*/
    | ArrayType '[' /*16L*/ ']' /*16L*/
    ;
Name =
     SimpleName
    | QualifiedName
    ;
SimpleName =
     Identifier
    ;
QualifiedName =
     Name '.' /*16L*/ Identifier
    ;
CompilationUnit =
     PackageDeclarationOpt ImportDeclarationsOpt TypeDeclarationsOpt
    ;
ImportDeclarations =
     ImportDeclaration
    | ImportDeclarations ImportDeclaration
    ;
TypeDeclarations =
     TypeDeclaration
    | TypeDeclarations TypeDeclaration
    ;
PackageDeclaration =
     PACKAGE Name ';'
    ;
ImportDeclaration =
     SingleTypeImportDeclaration
    | TypeImportOnDemandDeclaration
    ;
SingleTypeImportDeclaration =
     IMPORT Name ';'
    ;
TypeImportOnDemandDeclaration =
     IMPORT Name '.' /*16L*/ '*' /*12L*/ ';'
    ;
TypeDeclaration =
     ClassDeclaration
    | InterfaceDeclaration
    ;
Modifiers =
     Modifier
    | Modifiers Modifier
    ;
Modifier =
     PUBLIC
    | PROTECTED
    | PRIVATE
    | STATIC
    | ABSTRACT
    | FINAL
    | NATIVE
    | SYNCHRONIZED
    | TRANSIENT
    | VOLATILE
    ;
ClassDeclaration =
     ModifiersOpt CLASS Identifier SuperOpt InterfacesOpt ClassBody
    ;
Super =
     EXTENDS ClassType
    ;
Interfaces =
     IMPLEMENTS InterfaceTypeList
    ;
InterfaceTypeList =
     InterfaceType
    | InterfaceTypeList ',' InterfaceType
    ;
ClassBody =
     '{' ClassBodyDeclarationsOpt '}'
    ;
ClassBodyDeclarations =
     ClassBodyDeclaration
    | ClassBodyDeclarations ClassBodyDeclaration
    ;
ClassBodyDeclaration =
     ClassMemberDeclaration
    | StaticInitializer
    | ConstructorDeclaration
    | TypeDeclaration
    ;
ClassMemberDeclaration =
     FieldDeclaration
    | MethodDeclaration
    ;
FieldDeclaration =
     ModifiersOpt Type VariableDeclarators ';'
    | ModifiersOpt Type error ';'
    ;
VariableDeclarators =
     VariableDeclarator
    | VariableDeclarators ',' VariableDeclarator
    ;
VariableDeclarator =
     VariableDeclaratorId
    | VariableDeclaratorId ASS /*1R*/ VariableInitializer
    ;
VariableDeclaratorId =
     Identifier
    | VariableDeclaratorId '[' /*16L*/ ']' /*16L*/
    ;
VariableInitializer =
     Expression
    | ArrayInitializer
    ;
MethodDeclaration =
     MethodHeader MethodBody
    ;
MethodHeader =
     ModifiersOpt Type MethodDeclarator ThrowsOpt
    | ModifiersOpt VOID MethodDeclarator ThrowsOpt
    ;
MethodDeclarator =
     Identifier '(' /*16L*/ FormalParameterListOpt ')' /*16L*/
    | MethodDeclarator '[' /*16L*/ ']' /*16L*/
    ;
FormalParameterList =
     FormalParameter
    | FormalParameterList ',' FormalParameter
    ;
FormalParameter =
     Type VariableDeclaratorId
    ;
Throws =
     THROWS ClassTypeList
    ;
ClassTypeList =
     ClassType
    | ClassTypeList ',' ClassType
    ;
MethodBody =
     Block
    | ';'
    ;
StaticInitializer =
     STATIC Block
    ;
ConstructorDeclaration =
     ModifiersOpt ConstructorDeclarator ThrowsOpt ConstructorBody
    ;
ConstructorDeclarator =
     SimpleName '(' /*16L*/ FormalParameterListOpt ')' /*16L*/
    ;
ConstructorBody =
     '{' ExplicitConstructorInvocation BlockStatementsOpt '}'
    ;
ConstructorBody =
     '{' BlockStatementsOpt '}'
    ;
ExplicitConstructorInvocation =
     THIS '(' /*16L*/ ArgumentListOpt ')' /*16L*/ ';'
    | SUPER '(' /*16L*/ ArgumentListOpt ')' /*16L*/ ';'
    ;
InterfaceDeclaration =
     ModifiersOpt INTERFACE Identifier ExtendsInterfacesOpt InterfaceBody
    ;
ExtendsInterfaces =
     EXTENDS InterfaceType
    | ExtendsInterfaces ',' InterfaceType
    ;
InterfaceBody =
     '{' InterfaceMemberDeclarationsOpt '}'
    ;
InterfaceMemberDeclarations =
     InterfaceMemberDeclaration
    | InterfaceMemberDeclarations InterfaceMemberDeclaration
    ;
InterfaceMemberDeclaration =
     ConstantDeclaration
    | AbstractMethodDeclaration
    | TypeDeclaration
    ;
ConstantDeclaration =
     FieldDeclaration
    ;
AbstractMethodDeclaration =
     MethodHeader ';'
    ;
ArrayInitializer =
     '{' VariableInitializers ',' '}'
    | '{' VariableInitializers '}'
    | '{' ',' '}'
    | '{' '}'
    ;
VariableInitializers =
     VariableInitializer
    | VariableInitializers ',' VariableInitializer
    ;
Block =
     '{' BlockStatementsOpt '}'
    | '{' error '}'
    ;
BlockStatements =
     BlockStatement
    | BlockStatements BlockStatement
    ;
BlockStatement =
     LocalVariableDeclarationStatement
    | Statement
    | TypeDeclaration
    ;
LocalVariableDeclarationStatement =
     LocalVariableDeclaration ';'
    ;
LocalVariableDeclaration =
     Type VariableDeclarators
    ;
Statement =
     StatementWithoutTrailingSubstatement
    | LabeledStatement
    | IfThenStatement
    | IfThenElseStatement
    | WhileStatement
    | ForStatement
    ;
StatementNoShortIf =
     StatementWithoutTrailingSubstatement
    | LabeledStatementNoShortIf
    | IfThenElseStatementNoShortIf
    | WhileStatementNoShortIf
    | ForStatementNoShortIf
    ;
StatementWithoutTrailingSubstatement =
     Block
    | EmptyStatement
    | ExpressionStatement
    | SwitchStatement
    | DoStatement
    | BreakStatement
    | ContinueStatement
    | ReturnStatement
    | SynchronizedStatement
    | ThrowStatement
    | TryStatement
    ;
EmptyStatement =
     ';'
    ;
LabeledStatement =
     Identifier '=' /*2R*/ Statement
    ;
LabeledStatementNoShortIf =
     Identifier '=' /*2R*/ StatementNoShortIf
    ;
ExpressionStatement =
     StatementExpression ';'
    ;
StatementExpression =
     Assignment
    | PreIncrementExpression
    | PreDecrementExpression
    | PostIncrementExpression
    | PostDecrementExpression
    | MethodInvocation
    | ClassInstanceCreationExpression
    ;
IfThenStatement =
     IF '(' /*16L*/ Expression ')' /*16L*/ Statement
    ;
IfThenElseStatement =
     IF '(' /*16L*/ Expression ')' /*16L*/ StatementNoShortIf ELSE Statement
    ;
IfThenElseStatementNoShortIf =
     IF '(' /*16L*/ Expression ')' /*16L*/ StatementNoShortIf ELSE StatementNoShortIf
    ;
SwitchStatement =
     SWITCH '(' /*16L*/ Expression ')' /*16L*/ SwitchBlock
    ;
SwitchBlock =
     '{' SwitchBlockStatementGroups SwitchLabelsOpt '}'
    | '{' SwitchLabelsOpt '}'
    ;
SwitchBlockStatementGroups =
     SwitchBlockStatementGroup
    | SwitchBlockStatementGroups SwitchBlockStatementGroup
    ;
SwitchBlockStatementGroup =
     SwitchLabels BlockStatements
    ;
SwitchLabels =
     SwitchLabel
    | SwitchLabels SwitchLabel
    ;
SwitchLabel =
     CASE ConstantExpression '=' /*2R*/
    | DEFAULT '=' /*2R*/
    ;
WhileStatement =
     WHILE '(' /*16L*/ Expression ')' /*16L*/ Statement
    ;
WhileStatement =
     WHILE '(' /*16L*/ error ')' /*16L*/ Statement
    ;
WhileStatementNoShortIf =
     WHILE '(' /*16L*/ Expression ')' /*16L*/ StatementNoShortIf
    ;
DoStatement =
     DO Statement WHILE '(' /*16L*/ Expression ')' /*16L*/ ';'
    ;
ForStatement =
     FOR '(' /*16L*/ ForInitOpt ';' ExpressionOpt ';' ForUpdateOpt ')' /*16L*/ Statement
    ;
ForStatementNoShortIf =
     FOR '(' /*16L*/ ForInitOpt ';' ExpressionOpt ';' ForUpdateOpt ')' /*16L*/ StatementNoShortIf
    ;
ForInit =
     StatementExpressionList
    | LocalVariableDeclaration
    ;
ForUpdate =
     StatementExpressionList
    ;
StatementExpressionList =
     StatementExpression
    | StatementExpressionList ',' StatementExpression
    ;
BreakStatement =
     BREAK IdentifierOpt ';'
    ;
ContinueStatement =
     CONTINUE IdentifierOpt ';'
    ;
ReturnStatement =
     RETURN ExpressionOpt ';'
    ;
ThrowStatement =
     THROW Expression ';'
    ;
SynchronizedStatement =
     SYNCHRONIZED '(' /*16L*/ Expression ')' /*16L*/ Block
    ;
TryStatement =
     TRY Block Catches
    | TRY Block CatchesOpt Finally
    ;
Catches =
     CatchClause
    | Catches CatchClause
    ;
CatchClause =
     CATCH '(' /*16L*/ FormalParameter ')' /*16L*/ Block
    ;
Finally =
     FINALLY Block
    ;
Primary =
     PrimaryNoNewArray
    | ArrayCreationExpression
    ;
PrimaryNoNewArray =
     Literal
    | THIS
    | '(' /*16L*/ Expression ')' /*16L*/
    | ClassInstanceCreationExpression
    | FieldAccess
    | MethodInvocation
    | ArrayAccess
    ;
ClassInstanceCreationExpression =
     NEW /*13R*/ ClassType '(' /*16L*/ ArgumentListOpt ')' /*16L*/
    ;
ArgumentList =
     Expression
    | ArgumentList ',' Expression
    ;
ArrayCreationExpression =
     NEW /*13R*/ PrimitiveType DimExprs DimsOpt
    | NEW /*13R*/ ClassOrInterfaceType DimExprs DimsOpt
    ;
DimExprs =
     DimExpr
    | DimExprs DimExpr
    ;
DimExpr =
     '[' /*16L*/ Expression ']' /*16L*/
    ;
Dims =
     '[' /*16L*/ ']' /*16L*/
    | Dims '[' /*16L*/ ']' /*16L*/
    ;
FieldAccess =
     Primary '.' /*16L*/ Identifier
    | SUPER '.' /*16L*/ Identifier
    ;
MethodInvocation =
     Name '(' /*16L*/ ArgumentListOpt ')' /*16L*/
    | Primary '.' /*16L*/ Identifier '(' /*16L*/ ArgumentListOpt ')' /*16L*/
    | SUPER '.' /*16L*/ Identifier '(' /*16L*/ ArgumentListOpt ')' /*16L*/
    ;
ArrayAccess =
     Name '[' /*16L*/ Expression ']' /*16L*/
    | PrimaryNoNewArray '[' /*16L*/ Expression ']' /*16L*/
    ;
PostfixExpression =
     Primary
    | Name
    | PostIncrementExpression
    | PostDecrementExpression
    ;
PostIncrementExpression/*N15*/ =
     PostfixExpression INC //%prec POST /*15N*/
    ;
PostDecrementExpression/*N15*/ =
     PostfixExpression DEC //%prec POST /*15N*/
    ;
UnaryExpression =
     PreIncrementExpression
    | PreDecrementExpression
    //| '+' /*11L*/ UnaryExpression //%prec UMINUS /*14R*/
    //| '-' /*11L*/ UnaryExpression %prec UMINUS /*14R*/
    | UnaryExpressionNotPlusMinus
    ;
UnaryExpression/R14 =
    '+' /*11L*/ UnaryExpression //%prec UMINUS /*14R*/
    | '-' /*11L*/ UnaryExpression //%prec UMINUS /*14R*/
    ;
PreIncrementExpression/R14 =
     INC UnaryExpression //%prec PRE /*14R*/
    ;
PreDecrementExpression/R14 =
     DEC UnaryExpression //%prec PRE /*14R*/
    ;
UnaryExpressionNotPlusMinus =
     PostfixExpression
    | '~' /*14R*/ UnaryExpression
    | NOT /*14R*/ UnaryExpression
    | CastExpression
    ;
CastExpression/R13 =
     '(' /*16L*/ PrimitiveType DimsOpt ')' /*16L*/ UnaryExpression //%prec CAST /*13R*/
    | '(' /*16L*/ Expression ')' /*16L*/ UnaryExpressionNotPlusMinus //%prec CAST /*13R*/
    | '(' /*16L*/ Name Dims ')' /*16L*/ UnaryExpressionNotPlusMinus //%prec CAST /*13R*/
    ;
MultiplicativeExpression =
     UnaryExpression
    | MultiplicativeExpression '*' /*12L*/ UnaryExpression
    | MultiplicativeExpression '/' /*12L*/ UnaryExpression
    | MultiplicativeExpression '%' /*12L*/ UnaryExpression
    ;
AdditiveExpression =
     MultiplicativeExpression
    | AdditiveExpression '+' /*11L*/ MultiplicativeExpression
    | AdditiveExpression '-' /*11L*/ MultiplicativeExpression
    ;
ShiftExpression =
     AdditiveExpression
    | ShiftExpression LS /*10L*/ AdditiveExpression
    | ShiftExpression RS /*10L*/ AdditiveExpression
    | ShiftExpression URS /*10L*/ AdditiveExpression
    ;
RelationalExpression =
     ShiftExpression
    | RelationalExpression LT /*9N*/ ShiftExpression
    | RelationalExpression GT /*9N*/ ShiftExpression
    | RelationalExpression LE /*9N*/ ShiftExpression
    | RelationalExpression GE /*9N*/ ShiftExpression
    | RelationalExpression INSTANCEOF /*9N*/ ReferenceType
    ;
EqualityExpression =
     RelationalExpression
    | EqualityExpression EQ /*8L*/ RelationalExpression
    | EqualityExpression NE /*8L*/ RelationalExpression
    ;
AndExpression =
     EqualityExpression
    | AndExpression '&' /*7L*/ EqualityExpression
    ;
ExclusiveOrExpression =
     AndExpression
    | ExclusiveOrExpression '^' /*6L*/ AndExpression
    ;
InclusiveOrExpression =
     ExclusiveOrExpression
    | InclusiveOrExpression '|' /*5L*/ ExclusiveOrExpression
    ;
ConditionalAndExpression =
     InclusiveOrExpression
    | ConditionalAndExpression AND /*4L*/ InclusiveOrExpression
    ;
ConditionalOrExpression =
     ConditionalAndExpression
    | ConditionalOrExpression OR /*3L*/ ConditionalAndExpression
    ;
ConditionalExpression =
     ConditionalOrExpression
    | ConditionalOrExpression '?' /*2R*/ Expression '=' /*2R*/ ConditionalExpression
    ;
AssignmentExpression =
     ConditionalExpression
    | Assignment
    ;
Assignment =
     LeftHandSide AssignmentOperator AssignmentExpression
    ;
LeftHandSide =
     Name
    | FieldAccess
    | ArrayAccess
    ;
AssignmentOperator =
     ASS /*1R*/
    | MUL_ASS /*1R*/
    | DIV_ASS /*1R*/
    | MOD_ASS /*1R*/
    | ADD_ASS /*1R*/
    | SUB_ASS /*1R*/
    | LS_ASS /*1R*/
    | RS_ASS /*1R*/
    | URS_ASS /*1R*/
    | EMP_ASS /*1R*/
    | XOR_ASS /*1R*/
    | OR_ASS /*1R*/
    ;
Expression =
     AssignmentExpression
    ;
ConstantExpression =
     Expression
    ;
Identifier =
     ID
    ;
ArgumentListOpt =
     ArgumentList
    | /*empty*/
    ;
BlockStatementsOpt =
     BlockStatements
    | /*empty*/
    ;
CatchesOpt =
     Catches
    | /*empty*/
    ;
ClassBodyDeclarationsOpt =
     ClassBodyDeclarations
    | /*empty*/
    ;
DimsOpt =
     Dims
    | /*empty*/
    ;
ExpressionOpt =
     Expression
    | /*empty*/
    ;
ExtendsInterfacesOpt =
     ExtendsInterfaces
    | /*empty*/
    ;
ForInitOpt =
     ForInit
    | /*empty*/
    ;
ForUpdateOpt =
     ForUpdate
    | /*empty*/
    ;
FormalParameterListOpt =
     FormalParameterList
    | /*empty*/
    ;
IdentifierOpt =
     Identifier
    | /*empty*/
    ;
ImportDeclarationsOpt =
     ImportDeclarations
    | /*empty*/
    ;
InterfaceMemberDeclarationsOpt =
     InterfaceMemberDeclarations
    | /*empty*/
    ;
InterfacesOpt =
     Interfaces
    | /*empty*/
    ;
ModifiersOpt =
     Modifiers
    | /*empty*/
    ;
PackageDeclarationOpt =
     PackageDeclaration
    | /*empty*/
    ;
SuperOpt =
     Super
    | /*empty*/
    ;
SwitchLabelsOpt =
     SwitchLabels
    | /*empty*/
    ;
ThrowsOpt =
     Throws
    | /*empty*/
    ;
TypeDeclarationsOpt =
     TypeDeclarations
    | /*empty*/
    ;
patrickfrey commented 2 years ago

The conflict resolution of mewa is similar to yacc/lemon. But instead of using the commands %left PLUS MINUS %left TIMES DIVIDE MOD

you tag the productions with "+","-" with /L1 and the productions with "*","/","%" with /L2, etc. The tagging is done on the left side of the production, e.g.

ExclusiveOrExpression/L6 = AndExpression | ExclusiveOrExpression '^' AndExpression ;

There is an issue (#1) with missing conflicts due to lost conflicts because of this way of assigning priorities. But I do not consider this a problem. The effect is that whole branches of the grammar get lost. I already addressed this by restricting the use of priorities. For the rest of the cases, such errors seem for me easy to detect.

mingodad commented 2 years ago

Again as an exercise and demonstration of the capabilities of mewa could you try finish the java grammar shown up ? Probably you'll understand what kind of problems future users of mewa will face.

patrickfrey commented 2 years ago

This grammar is not usable as base. There are several issues:

1) You should not try to solve decision problems about types in grammar. For this, you have the type system. For example ReferenceType = ClassOrInterfaceType | ArrayType ; ArrayType/L16 = PrimitiveType '[' ']' | Name '[' ']' | ArrayType '[' ']' ;

This is wrong. It leads to non-solvable problems in the grammar. Furthermore, it is not complete. A literal can reference all of these items. Think about templates. An expression of the form "x[3]" can be a value type (array access of variable x) or a data type array of size 3. In mewa this is decided by the type system. A node "new x[3]" is in both cases parsed the same way. The type system evaluation of a node "new x[3]" tests if its argument is a data type and reports an error otherwise.

Do simplify your grammar and decide questions about types in the type system part of your code.

2) You do not have to use AndExpression, OrExpression, AssignmentExpression, etc. because it does not make sense to restrict to any of these. For example for the left-hand side of an assignment expression. One the left-hand side you can have any expression. Use only one term for expression. The priority decides how the AST is build.

Expression/L16 = Expression '.' Identifier ; Expression/L16 = Expression '(' ArgumentListOpt ')' | Expression '.' Identifier '(' ArgumentListOpt ')' ; Expression/L16 = Expression '[' Expression ']' | Expression '[' Expression ']' ; Expression/L15 = Expression INC | Expression DEC ; Expression/L11 = '+' Expression | '-' Expression ; Expression/L14 = INC Expression | DEC Expression ; Expression/L14 = '~' Expression | NOT Expression | CastExpression ; Expression/L16 = '(' PrimitiveType DimsOpt ')' Expression | '(' Expression ')' Expression | '(' Name Dims ')' Expression ; Expression/L12 = Expression '*' Expression | Expression '/' Expression | Expression '%' Expression ; Expression/L11 = | Expression '+' Expression | Expression '-' Expression ; Expression/L10 = Expression LS Expression | Expression RS Expression | Expression URS Expression ; Expression/L9 = Expression LT Expression | Expression GT Expression | Expression LE Expression | Expression GE Expression | Expression INSTANCEOF ReferenceType ; Expression/L8 = Expression EQ Expression | Expression NE Expression ; Expression/L7 = | Expression '&' Expression ; Expression/L6 = | Expression '^' Expression ; Expression/L5 = Expression '|' Expression ; Expression/L4 = | Expression AND Expression ; Expression/L3 = Expression | Expression OR Expression ; Expression/L2 = | Expression '?' Expression '=' Expression ; Expression/L1 = Expression AssignmentOperator Expression ;

3) In mewa there is no such thing as no associativity (for comparisons like a == b == c, comma separated lists, etc.). You build for a binary tree and you build the list in the type system part. Write a Lua method for this. I consider it to be too complicated without much gain to do this decision in the grammar. It is far easier with the rule of one tree node per production. I think it would lead to more annotation elements needed to describe the AST structure built. The case of a list (the collection of all list elements) can be handled in a general way in the type system part.

4) In mewa the actions follow not immediately after the rule was invocated. This is deliberating. A lot of decisions can be postponed to the tree traversal. You can make a clear separation between structural analysis and type structure building if the language allows it.

mingodad commented 2 years ago

Thank you for your feedback ! Here is a list of several yacc grammars I have on my computer, so it's not in your plan to provide a way to reuse anything already existing ?

"open64-gcc-parse.y"
"verilator_verilog.y",
"ruby-parser.y",
"dev/c/A_grammars/bison-3.7.6/src/parse-gram.y",
"ruby-3.0.1/parse.tmp.y",
"Compiler-for-C-Like-Language/src/parser.y",
"cfront-3-ubuntu-dad/src/gram.y",
"Beef/extern/hunspell/intl/plural.y",
"hiphop-php/hphp/runtime/base/zend/zend_ini.y",
"hiphop-php/hphp/util/parser/hphp.y",
"lcc/lburg/gram.y",
"dotgnu-pnet/libjit-old/tools/gen-rules-parser.y",
"dotgnu-pnet/libjit-old/tools/gen-ops-parser.y",
"dotgnu-pnet/libjit-old/dpas/dpas-parser.y",
"dotgnu-pnet/pnet/cscc/vb/vb_grammar.y",
"dotgnu-pnet/pnet/cscc/c/c_grammar.y",
"dotgnu-pnet/pnet/cscc/java/java_grammar.y",
"dotgnu-pnet/pnet/cscc/bf/bf_grammar.y",
"dotgnu-pnet/pnet/cscc/csharp/cs_grammar.y",
"dotgnu-pnet/pnet/ilasm/ilasm_grammar.y",
"dotgnu-pnet/treecc/examples/gram_c.y",
"libjit-copy/libjit/tools/gen-rules-parser.y",
"libjit-copy/libjit/tools/gen-sel-parser.y",
"libjit-copy/libjit/dpas/dpas-parser.y",
"regina-rexx-3.9.3/yaccsrc.y",
"pfff/lang_go/parsing/orig/go.y",
"pfff/docs/official-grammars/php/5.4.0rc1/zend_language_parser.y",
"pfff/docs/official-grammars/php/5.3.0/zend_language_parser.y",
"pfff/docs/official-grammars/php/hphp-may-2012/hphp.y",
"pfff/docs/official-grammars/php/5.4.3/zend_language_parser.y",
"pfff/docs/official-grammars/php/xhp/xhp_orig_when_port.y",
"pfff/docs/official-grammars/php/5.2.11/zend_language_parser.y",
"bigloo/examples/Yacc2bigloo/gram.y",
"php-8.0.0RC3/sapi/phpdbg/phpdbg_parser.y",
"php-8.0.0RC3/Zend/zend_language_parser.y",
"php-8.0.0RC3/Zend/zend_ini_parser.y",
"php-8.0.0RC3/ext/json/json_parser.y",
"never/front/parser.y",
"tsion-tinyscheme/libgpl/rex_util_y.y",
"eureka/lib/ekParser.y",
"libjit-patched/libjit/tools/gen-rules-parser.y",
"libjit-patched/libjit/tools/gen-sel-parser.y",
"libjit-patched/libjit/dpas/dpas-parser.y",
"icon-v951src/src/rtt/rttgram.y",
"c-alike-parser/parser.y",
"ctags/Units/parser-yacc.r/nested.d/input.y",
"ctags/Units/parser-yacc.r/not-union.d/input.y",
"ctags/Units/parser-yacc.r/c-anon-ids.d/input.y",
"ctags/Units/parser-yacc.r/bom.d/input.y",
"global-6.6.3/libparser/asm_parse.y",
"libjit-savana/tools/gen-rules-parser.y",
"libjit-savana/tools/gen-ops-parser.y",
"libjit-savana/dpas-tcc/dpas-parser.y",
"libjit-savana/dpas/dpas-parser.y",
"Compiler-design-using-flex-and-Bison/main.y",
"q-7.11/src/qmparse.y",
"q-7.11/src/qc.y",
"qp10.6/src/qa.y",
"LLVM_Flex_Bison_MiniC/minic.y",
"not_rune/parsegen/rules.y",
"zig/zig-spec/grammar/grammar.y",
"R-3.5.1/src/main/gram.y",
"R-3.5.1/src/library/tools/src/gramRd.y",
"R-3.5.1/src/library/tools/src/gramLatex.y",
"Escher/SRC/escher-parser.y",
"cyclone/banshee/cparser/c-parse.y",
"cyclone/banshee/ibanshee/parser.y",
"cyclone/tests/foo.y",
"cyclone/tests/boa-0.94.8.3/src/boa_grammar.y",
"cyclone/tests/boa-0.94.8.3/src-cyclone/boa_grammar.y",
"cyclone/tools/cyclex/parser.y",
"cyclone/tools/yakker/parse.y",
"cyclone/src/parse.y",
"cyclone/lib/xml/xmlparse.y",
"kinx-filtered/src/kinx.y",
"C-Compiler/raw/parse.y",
"ruby-3.0.1/parse.tmp.y",
"ruby-3.0.1/parse.y",
"ruby-3.0.1/test/racc/start.y",
"ruby-3.0.1/test/racc/assets/edtf.y",
"ruby-3.0.1/test/racc/assets/tp_plus.y",
"ruby-3.0.1/test/racc/assets/norule.y",
"ruby-3.0.1/test/racc/assets/ruby19.y",
"ruby-3.0.1/test/racc/assets/intp.y",
"ruby-3.0.1/test/racc/assets/conf.y",
"ruby-3.0.1/test/racc/assets/cadenza.y",
"ruby-3.0.1/test/racc/assets/unterm.y",
"ruby-3.0.1/test/racc/assets/nullbug1.y",
"ruby-3.0.1/test/racc/assets/syntax.y",
"ruby-3.0.1/test/racc/assets/ruby18.y",
"ruby-3.0.1/test/racc/assets/useless.y",
"ruby-3.0.1/test/racc/assets/digraph.y",
"ruby-3.0.1/test/racc/assets/percent.y",
"ruby-3.0.1/test/racc/assets/expect.y",
"ruby-3.0.1/test/racc/assets/firstline.y",
"ruby-3.0.1/test/racc/assets/liquor.y",
"ruby-3.0.1/test/racc/assets/ifelse.y",
"ruby-3.0.1/test/racc/assets/riml.y",
"ruby-3.0.1/test/racc/assets/namae.y",
"ruby-3.0.1/test/racc/assets/ruby21.y",
"ruby-3.0.1/test/racc/assets/yyerr.y",
"ruby-3.0.1/test/racc/assets/scan.y",
"ruby-3.0.1/test/racc/assets/opal.y",
"ruby-3.0.1/test/racc/assets/ruby20.y",
"ruby-3.0.1/test/racc/assets/huia.y",
"ruby-3.0.1/test/racc/assets/newsyn.y",
"ruby-3.0.1/test/racc/assets/csspool.y",
"ruby-3.0.1/test/racc/assets/noend.y",
"ruby-3.0.1/test/racc/assets/journey.y",
"ruby-3.0.1/test/racc/assets/mailp.y",
"ruby-3.0.1/test/racc/assets/err.y",
"ruby-3.0.1/test/racc/assets/chk.y",
"ruby-3.0.1/test/racc/assets/error_recovery.y",
"ruby-3.0.1/test/racc/assets/normal.y",
"ruby-3.0.1/test/racc/assets/ruby22.y",
"ruby-3.0.1/test/racc/assets/macruby.y",
"ruby-3.0.1/test/racc/assets/mediacloth.y",
"ruby-3.0.1/test/racc/assets/mof.y",
"ruby-3.0.1/test/racc/assets/echk.y",
"ruby-3.0.1/test/racc/assets/nokogiri-css.y",
"ruby-3.0.1/test/racc/assets/php_serialization.y",
"ruby-3.0.1/test/racc/assets/rrconf.y",
"ruby-3.0.1/test/racc/assets/machete.y",
"ruby-3.0.1/test/racc/assets/cast.y",
"ruby-3.0.1/test/racc/assets/twowaysql.y",
"ruby-3.0.1/test/racc/assets/opt.y",
"ruby-3.0.1/test/racc/assets/recv.y",
"ruby-3.0.1/test/racc/assets/nasl.y",
"ruby-3.0.1/test/racc/assets/nullbug2.y",
"ruby-3.0.1/test/racc/assets/ichk.y",
"ruby-3.0.1/test/racc/assets/nonass.y",
"ruby-3.0.1/test/racc/bench.y",
"ruby-3.0.1/test/racc/infini.y",
"ruby-3.0.1/.bundle/gems/rbs-1.0.4/lib/rbs/parser.y",
"ruby-3.0.1/ext/ripper/ripper.y",
"slsc/src/gram.y",
"ruby-lemon-parse/parse.y",
"UnderC-alex/src/parser.y",
"qbe/minic/minic.y",
"cilk-5.4.6/cilk2c/ANSI-C.y",
"dino-0.55/MSTA/oberon2-lex.y",
"dino-0.55/MSTA/lex.y",
"dino-0.55/MSTA/oberon2-gram.y",
"dino-0.55/MSTA/fparse.y",
"dino-0.55/MSTA/java.y",
"dino-0.55/MSTA/pascal.y",
"dino-0.55/MSTA/gram.y",
"dino-0.55/MSTA/m2c.y",
"dino-0.55/MSTA/p-yacc.y",
"dino-0.55/MSTA/fcalc1.y",
"dino-0.55/MSTA/fcalc.y",
"dino-0.55/MSTA/yacc.y",
"dino-0.55/MSTA/c5.y",
"dino-0.55/MSTA/sql1.y",
"dino-0.55/MSTA/cpp5.y",
"dino-0.55/AMMUNITION/sgramm.y",
"dino-0.55/SPRUT/yacc.y",
"dino-0.55/DINO/d_yacc.y",
"dino-0.55/SHILKA/yacc.y",
"C++ 3.0.3/source/src/gram.y",
"gecko-dev/security/nss/cmd/modutil/installparse.y",
"gecko-dev/gfx/angle/src/compiler/translator/glslang.y",
"gecko-dev/gfx/angle/src/compiler/preprocessor/ExpressionParser.y",
"Ice-3.5.1/cpp/demo/Freeze/phonebook/Grammar.y",
"Ice-3.5.1/cpp/demo/Freeze/library/Grammar.y",
"Ice-3.5.1/cpp/demo/book/map_filesystem/Grammar.y",
"Ice-3.5.1/cpp/demo/book/evictor_filesystem/Grammar.y",
"Ice-3.5.1/cpp/demo/book/lifecycle/Grammar.y",
"Ice-3.5.1/cpp/test/Freeze/complex/Grammar.y",
"Ice-3.5.1/cpp/src/FreezeScript/Grammar.y",
"Ice-3.5.1/cpp/src/IceStorm/Grammar.y",
"Ice-3.5.1/cpp/src/Slice/Grammar.y",
"Ice-3.5.1/cpp/src/IceGrid/Grammar.y",
"exolang/bin/quex/demo/C/008/Calc_parser.y",
"exolang/src/exo/parser/parser.y",
"llvm-base-compiler/parser/grammar.y",
"cc-llvm-bison/c.y",
"cfront-2/ipc/internet/htable/parse.y",
"cfront-2/ipc/servers/ftpcmd.y",
"cfront-2/cmd/awk/awk.g.y",
"cfront-2/cmd/cpp/cpy.y",
"cfront-2/cmd/lex/parser.y",
"cfront-2/cmd/lex/o.lex/parser.y",
"cfront-2/cmd/pico/x.y",
"cfront-2/cmd/config/config.y",
"cfront-2/cmd/flex/parse.y",
"cfront-2/cmd/usgmake/gram.y",
"cfront-2/cmd/hoc/hoc.y",
"cfront-2/cmd/eqn/eqn.y",
"cfront-2/cmd/ap/apkeys/kpar.y",
"cfront-2/cmd/make/gram.y",
"cfront-2/cmd/oworm/oscsi/gram.y",
"cfront-2/cmd/oworm/scsi/gram.y",
"cfront-2/cmd/gcc/cexp.y",
"cfront-2/cmd/gcc/parse.y",
"cfront-2/cmd/egrep/gram.y",
"cfront-2/cmd/neqn/e.y",
"cfront-2/cmd/expr/expr.y",
"cfront-2/cmd/struct/beauty.y",
"cfront-2/cmd/picasso/picassoy.y",
"cfront-2/cmd/m4/m4y.y",
"cfront-2/cmd/netnews/src/getdate.y",
"cfront-2/cmd/dag/parsedag.y",
"cfront-2/cmd/bc.y",
"cfront-2/cmd/pic/picy.y",
"cfront-2/cmd/ccom/common/sty.y",
"cfront-2/cmd/ccom/common/cgram.y",
"cfront-2/cmd/prefer/prefawk/awk.g.y",
"cfront-2/cmd/2500/gram.y",
"cfront-2/cmd/basic/basic/eval.y",
"cfront-2/cmd/cfront/ptcfront/gram.y",
"cfront-2/cmd/cfront/cfront2.00/gram.y",
"cfront-2/cmd/cfront/ooptcfront/gram.y",
"cfront-2/cmd/cfront/xptcfront/gram.y",
"cfront-2/cmd/cfront/optcfront/gram.y",
"cfront-2/cmd/cfront/cfront/gram.y",
"cfront-2/cmd/cfront/ocfront/gram.y",
"cfront-2/cmd/pcc1/mip/cgram.y",
"cfront-2/cmd/pascal/pi/pas.y",
"cfront-2/cmd/pascal/pc0/pas.y",
"cfront-2/cmd/pascal/pxp/pas.y",
"cfront-2/cmd/visi/parse.y",
"cfront-2/cmd/numdate/getdate.y",
"cfront-2/cmd/grap/non-ansi/grap.y",
"cfront-2/cmd/grap/grap.y",
"cfront-2/cmd/units/units.y",
"cfront-2/cmd/pret/pret.y",
"cfront-2/cmd/twig/twig.y",
"cfront-2/cmd/ideal/idyac.y",
"cfront-2/cmd/kasb/kas0.y",
"cfront-2/history/ix/src/cmd/priv/gram.y",
"cfront-2/history/ix/src/cmd/priv/dfa.y",
"cfront-2/history/ix/src/cmd/privserv/gram.y",
"cfront-2/history/ix/src/cmd/privserv/dfa.y",
"Dynace/ODBC/sqlgrammar.y",
"UnderC/src/parser.y",
"cfront-1/src/gram.y",
"libjit/tools/gen-rules-parser.y",
"libjit/tools/gen-ops-parser.y",
"libjit/dpas/dpas-parser.y",
"ack-6.0pre4/util/cgg/bootgram.y",
"ack-6.0pre4/util/opt/mktab.y",
"ack-6.0pre4/util/ncgg/cgg.y",
"ack-6.0pre4/mach/proto/as/comm2.y",
"txr/parser.y",
"rustc-1.49.0-src/vendor/libnghttp2-sys/nghttp2/third-party/mruby/mrbgems/mruby-compiler/core/parse.y",
"hobbes/lib/hobbes/read/pgen/hexpr.y",
"cfront-3-ubuntu/src/gram.y",
"MiniC/minic.y",
"compyler/parser.y",
"kinx/src/extlib/kc-json/src/kc-json.y",
"kinx/src/kinx.y",
"cfront1985/cfront/gram.y",
"cfront-3-descent/src/gram.y",
"biGramAnalyser/bisongram.y",
"++-grammar2/c5.y",
"c++-grammar2/cpp5.y",
"vera/build/src/boost-prefix/src/boost/tools/build/src/engine/jamgram.y",
"NatLang/app/src/NatLang.y",
"mruby/mrbgems/mruby-compiler/core/parse.y",
"unicon/uni/iyacc/test/calc.y",
"unicon/uni/ulex/lexgram.y",
"unicon/uni/parser/unigram.y",
"unicon/uni/unicon/unigram.y",
"micropython/lib/axtls/config/scripts/config/zconf.y",
"ksm-0.3.2/clike/parse.y",
"cfront-3.0.3.1/src/gram.y",
"cfront-3/src/gram.y",
"zenlang/lib/base/parserGen.y",
"vala-0.50.1/gobject-introspection/scannerparser.y",
"datadraw/src/dvparse.y",
"miranda/rules.y",
"miranda/new/rules.y",
"ecere-sdk/compiler/libec/src/type.y",
"ecere-sdk/compiler/libec/src/expression.y",
"ecere-sdk/compiler/libec/src/grammar.y",
"TinyCompiler/grammar.y",
"cscope/src/egrep.y",
"adt4c-with-polymorphism/dev/Tree/parser.y",
"adt4c-with-polymorphism/dev/performance/parser.y",
"adt4c-with-polymorphism/src/parser.y",
"gcc-9.2.0/intl/plural.y",
"event-compiler/src/parser.y",
"hhvm/hphp/runtime/base/ini-parser/zend-ini.y",
"hhvm/hphp/parser/hphp.y",
"aldor/aldor/aldor/tools/unix/zaccgram.y",
"aldor/aldor/aldor/tools/unix/cparse.y",
"UnderC-dad/src/parser.y",
"UnderC-dad/src/parser-dad.y",
"bitc/src/tests/grammars/mixfix.y",
"bitc/src/compiler/TransitionParser.y",
"minima.l/lib/grammar/parser.y",
patrickfrey commented 2 years ago

You point your finger at a good point here. Existing. One should not neglect things already done. I don't. But a grammar like the one you sent me, cannot work. Mewa wants to separate structural things and type system. Because there are things that are not decideable with structural means only. The most complicated thing in the implementation of a programming language is the type system. The grammar is done in an afternoon. The first type system needs many weeks. Saving the one day for the grammar will harm you more on the long term. Implementing a grammar that solves only syntax problems and solving the other problems in the type system will enlighten you. Once you have seen that variables are types, integers or strings are types and functions are types, generic programming and lambdas come for free. Mewa is not about grammars. It's about types. Therefore I recommend to spend that day to write the grammar for mewa from scratch. As simple as possible. An expression is a structure and it returns a type.

mingodad commented 2 years ago

Again thank you for all your help and feedback ! I'll be looking at mewa time to time maybe in the future seeing a functional non trivial usable example would bring me back to try it. Cheers !

patrickfrey commented 2 years ago

There is a non trivial functional language example "language1". Including test programs tests/*.prg in the language. It has structures, functions, classes, generics and lambdas.

patrickfrey commented 2 years ago

https://github.com/patrickfrey/mewa/blob/master/doc/example_language1.md

The example programs listed there are compiled and run with "make test".