inspirer / textmapper

Lexer and Parser generator
http://textmapper.org
MIT License
108 stars 25 forks source link

Problems trying to convert PostgreSQL-16 grammar #69

Open mingodad opened 1 year ago

mingodad commented 1 year ago

Trying to convert a working PostgreSQL-16 grammar to texmapper I found several issues:

The original grammar has no unresolved shift/reduce of reduce/reduce conflicts but textmapper report:

postgresql-16.tm,4198: input: MODE_TYPE_NAME x_ucharacter
reduce/reduce conflict (next: eoi, IDENT, ABORT_P, ABSENT, ABSOLUTE_P, ACCESS, ACTION, ADD_P, ADMIN, AFTER, AGGREGATE, ALL, ALSO, ALTER, ALWAYS, ANALYSE, ANALYZE, AND, ANY, ARRAY, AS, ASC, ASENSITIVE, ASSERTION, ASSIGNMENT, ASYMMETRIC, AT, ATOMIC, ATTACH, ATTRIBUTE, AUTHORIZATION, BACKWARD, BEFORE, BEGIN_P, BETWEEN, BIGINT, BINARY, BIT, BOOLEAN_P, BOTH, BREADTH, BY, CACHE, CALL, CALLED, CASCADE, CASCADED, CASE, CAST, CATALOG_P, CHAIN, CHARACTERISTICS, CHECK, CHECKPOINT, CLASS, CLOSE, CLUSTER, COALESCE, COLLATE, COLLATION, COLUMN, COLUMNS, COMMENT, COMMENTS, COMMIT, COMMITTED, COMPRESSION, CONCURRENTLY, CONFIGURATION, CONFLICT, CONNECTION, CONSTRAINT, CONSTRAINTS, CONTENT_P, CONTINUE_P, CONVERSION_P, COPY, COST, CREATE, CROSS, CSV, CUBE, CURRENT_P, CURRENT_CATALOG, CURRENT_DATE, CURRENT_ROLE, CURRENT_SCHEMA, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_USER, CURSOR, CYCLE, DATA_P, DATABASE, DEALLOCATE, DEC, DECIMAL_P, DECLARE, DEFAULT, DEFAULTS, DEFERRABLE, DEFERRED, DEFINER, DELETE_P, DELIMITER, DELIMITERS, DEPENDS, DEPTH, DESC, DETACH, DICTIONARY, DISABLE_P, DISCARD, DISTINCT, DO, DOCUMENT_P, DOMAIN_P, DOUBLE_P, DROP, EACH, ELSE, ENABLE_P, ENCODING, ENCRYPTED, END_P, ENUM_P, ESCAPE, EVENT, EXCEPT, EXCLUDE, EXCLUDING, EXCLUSIVE, EXECUTE, EXISTS, EXPLAIN, EXPRESSION, EXTENSION, EXTERNAL, EXTRACT, FALSE_P, FAMILY, FETCH, FINALIZE, FIRST_P, FLOAT_P, FOLLOWING, FOR, FORCE, FOREIGN, FORMAT, FORWARD, FREEZE, FROM, FULL, FUNCTION, FUNCTIONS, GENERATED, GLOBAL, GRANT, GRANTED, GREATEST, GROUP_P, GROUPING, GROUPS, HANDLER, HAVING, HEADER_P, HOLD, IDENTITY_P, IF_P, ILIKE, IMMEDIATE, IMMUTABLE, IMPLICIT_P, IMPORT_P, IN_P, INCLUDE, INCLUDING, INCREMENT, INDENT, INDEX, INDEXES, INHERIT, INHERITS, INITIALLY, INLINE_P, INNER_P, INOUT, INPUT_P, INSENSITIVE, INSERT, INSTEAD, INT_P, INTEGER, INTERSECT, INTERVAL, INTO, INVOKER, IS, ISNULL, ISOLATION, JOIN, JSON, JSON_ARRAY, JSON_ARRAYAGG, JSON_OBJECT, JSON_OBJECTAGG, KEY, KEYS, LABEL, LANGUAGE, LARGE_P, LAST_P, LATERAL_P, LEADING, LEAKPROOF, LEAST, LEFT, LEVEL, LIKE, LIMIT, LISTEN, LOAD, LOCAL, LOCALTIME, LOCALTIMESTAMP, LOCATION, LOCK_P, LOCKED, LOGGED, MAPPING, MATCH, MATCHED, MATERIALIZED, MAXVALUE, MERGE, METHOD, MINVALUE, MODE, MOVE, NAME_P, NAMES, NATIONAL, NATURAL, NCHAR, NEW, NEXT, NFC, NFD, NFKC, NFKD, NO, NONE, NORMALIZE, NORMALIZED, NOT, NOTHING, NOTIFY, NOTNULL, NOWAIT, NULL_P, NULLIF, NULLS_P, NUMERIC, OBJECT_P, OF, OFF, OFFSET, OIDS, OLD, ON, ONLY, OPERATOR, OPTION, OPTIONS, OR, ORDER, ORDINALITY, OTHERS, OUT_P, OUTER_P, OVERLAY, OVERRIDING, OWNED, OWNER, PARALLEL, PARAMETER, PARSER, PARTIAL, PARTITION, PASSING, PASSWORD, PLACING, PLANS, POLICY, POSITION, PRECEDING, PREPARE, PREPARED, PRESERVE, PRIMARY, PRIOR, PRIVILEGES, PROCEDURAL, PROCEDURE, PROCEDURES, PROGRAM, PUBLICATION, QUOTE, RANGE, READ, REAL, REASSIGN, RECHECK, RECURSIVE, REF_P, REFERENCES, REFERENCING, REFRESH, REINDEX, RELATIVE_P, RELEASE, RENAME, REPEATABLE, REPLACE, REPLICA, RESET, RESTART, RESTRICT, RETURN, RETURNING, RETURNS, REVOKE, RIGHT, ROLE, ROLLBACK, ROLLUP, ROUTINE, ROUTINES, ROW, ROWS, RULE, SAVEPOINT, SCALAR, SCHEMA, SCHEMAS, SCROLL, SEARCH, SECURITY, SELECT, SEQUENCE, SEQUENCES, SERIALIZABLE, SERVER, SESSION, SESSION_USER, SET, SETOF, SETS, SHARE, SHOW, SIMILAR, SIMPLE, SKIP, SMALLINT, SNAPSHOT, SOME, SQL_P, STABLE, STANDALONE_P, START, STATEMENT, STATISTICS, STDIN, STDOUT, STORAGE, STORED, STRICT_P, STRIP_P, SUBSCRIPTION, SUBSTRING, SUPPORT, SYMMETRIC, SYSID, SYSTEM_P, SYSTEM_USER, TABLE, TABLES, TABLESAMPLE, TABLESPACE, TEMP, TEMPLATE, TEMPORARY, TEXT_P, THEN, TIES, TIME, TIMESTAMP, TRAILING, TRANSACTION, TRANSFORM, TREAT, TRIGGER, TRIM, TRUE_P, TRUNCATE, TRUSTED, TYPE_P, TYPES_P, UESCAPE, UNBOUNDED, UNCOMMITTED, UNENCRYPTED, UNION, UNIQUE, UNKNOWN, UNLISTEN, UNLOGGED, UNTIL, UPDATE, USER, USING, VACUUM, VALID, VALIDATE, VALIDATOR, VALUE_P, VALUES, VARCHAR, VARIADIC, VERBOSE, VERSION_P, VIEW, VIEWS, VOLATILE, WHEN, WHERE, WHITESPACE_P, WINDOW, WITH, WITHOUT, WORK, WRAPPER, WRITE, XML_P, XMLATTRIBUTES, XMLCONCAT, XMLELEMENT, XMLEXISTS, XMLFOREST, XMLNAMESPACES, XMLPARSE, XMLPI, XMLROOT, XMLSERIALIZE, XMLTABLE, YES_P, ZONE, LESS_EQUALS, GREATER_EQUALS, NOT_EQUALS, TYPECAST, FORMAT_LA, NULLS_LA, NOT_LA, Op, ';', '=', ')', ',', '*', '/', '+', '-', '%', '[', ']', '^', '<', '>', ':')
    SimpleTypename : x_ucharacter
    CharacterWithoutLength : x_ucharacter

postgresql-16.tm,4273: input: MODE_TYPE_NAME x_ucharacter
shift/reduce conflict (next: '(')
    CharacterWithoutLength : x_ucharacter

postgresql-16.tm,4273: input: SELECT distinct_clause x_ucharacter
shift/reduce conflict (next: '(')
    CharacterWithoutLength : x_ucharacter

postgresql-16.tm,4259: input: SELECT distinct_clause CharacterWithLength
reduce/reduce conflict (next: SCONST)
    x_ucharacter : CharacterWithLength
    ConstCharacter : CharacterWithLength

postgresql-16.tm,4260: input: SELECT distinct_clause CharacterWithoutLength
reduce/reduce conflict (next: SCONST)
    x_ucharacter : CharacterWithoutLength
    ConstCharacter : CharacterWithoutLength

conflicts: 2 shift/reduce and 483 reduce/reduce
lalr: 0.585s, text: 1.394s, parser: 6221 states, 3011KB

See attached the converted grammar: postgresql-16.tm.zip

Also a working grammar can be seen here https://meimporta.eu/lalr-playground/

mingodad commented 1 year ago

I found a mistake on my renaming of rules during the conversion and now textmapper can manage the grammar attached bellow without unresolved conflicts.

But the issue reported above still apply and also there is a rule OptWith that somehow textmapper get lost with it:

textmapper.sh "postgresql-16.tm"
postgresql-16.tm,1289: OptWith cannot be resolved
postgresql-16.tm,1290: OptWith cannot be resolved
postgresql-16.tm,1291: OptWith cannot be resolved
postgresql-16.tm,1292: OptWith cannot be resolved
postgresql-16.tm,1293: OptWith cannot be resolved
postgresql-16.tm,1294: OptWith cannot be resolved
postgresql-16.tm,1602: OptWith cannot be resolved

But if we rename it and all it's references to xOptWith then textmapper is happy and parses it fine.

postgresql-16.tm.zip

mingodad commented 1 year ago

When trying to execute parser/lexer I'm getting this error message:

bash build-PgSql.sh
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.textmapper.postgresql.PgSql.main(PgSql.java:52)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 168075 out of bounds for length 168075
    at org.textmapper.postgresql.PgSqlLexer.unpack_vc_short(PgSqlLexer.java:3156)
    at org.textmapper.postgresql.PgSqlLexer.<clinit>(PgSqlLexer.java:710)
    ... 1 more

build-PgSql.sh:

javac  -d .  PgSqlLexer.java PgSqlParser.java PgSql.java
java org.textmapper.postgresql.PgSql

PgSql.java:

package org.textmapper.postgresql;
import org.textmapper.postgresql.PgSqlLexer;
import org.textmapper.postgresql.PgSqlLexer.ErrorReporter;
import java.util.List;
import java.util.ArrayList;

/**
 * Main console entry point for the Textmapper engine.
 */
public class PgSql {

    public static class LapgProblem extends Exception {
        private static final long serialVersionUID = 1L;

        private final int kind;
        private final int line;
        private final int offset;
        private final int endoffset;

        public LapgProblem(int kind, String message, int line, int offset, int endoffset, Throwable cause) {
            super(message, cause);
            this.kind = kind;
            this.line = line;
            this.offset = offset;
            this.endoffset = endoffset;
        }

        public int getKind() {
            return kind;
        }

        public int getLine() {
            return line;
        }

        public int getOffset() {
            return offset;
        }

        public int getEndoffset() {
            return endoffset;
        }

    }

    public static void main(String[] args) {
        final List<LapgProblem> list = new ArrayList<>();
        ErrorReporter reporter = (message, line, offset, endoffset) ->
                list.add(new LapgProblem(1, message, line, offset, endoffset, null));

        try {
            PgSqlLexer lexer = new PgSqlLexer(" -- comment\nselect id, name from users;\n" , reporter);
            PgSqlParser parser = new PgSqlParser(reporter);
            Object result = parser.parse(lexer);
            System.out.println(result);
        } catch (Exception e) {
            /* ignore */
            System.out.println(e);
        }

    }
}
mingodad commented 1 year ago

Also when trying to convert the carbon-lang grammar I'm getting this errors:

textmapper.sh "carbon-lang.tm"
carbon-lang.tm,179: TRUE cannot be resolved
carbon-lang.tm,183: STRING cannot be resolved
carbon-lang.tm,185: TYPE cannot be resolved
carbon-lang.tm,314: UNIMPL_EXAMPLE cannot be resolved
carbon-lang.tm,391: WHERE cannot be resolved
carbon-lang.tm,409: THEN cannot be resolved
carbon-lang.tm,470: TEMPLATE cannot be resolved
carbon-lang.tm,473: VAR cannot be resolved
carbon-lang.tm,478: UNDERSCORE cannot be resolved
carbon-lang.tm,518: VAR cannot be resolved
carbon-lang.tm,519: VAR cannot be resolved
carbon-lang.tm,520: VAR cannot be resolved
carbon-lang.tm,521: VAR cannot be resolved
carbon-lang.tm,525: WHILE cannot be resolved
carbon-lang.tm,529: VAR cannot be resolved
carbon-lang.tm,589: TEMPLATE cannot be resolved
carbon-lang.tm,623: VIRTUAL cannot be resolved
carbon-lang.tm,688: VAR cannot be resolved
carbon-lang.tm,689: VAR cannot be resolved
carbon-lang.tm,723: VIRTUAL cannot be resolved

carbon-lang.tm:

language carbonlang(java);

package = "org.textmapper.carbonlang"
prefix = "PgSql"
breaks = true
gentree = true
genast = false
positions = "line,offset"
endpositions = "offset"
genbison = true

:: lexer

WhiteSpace:   /[\n\r\t ]+/   (space)
commentChars = /([^*]|\*+[^*\/])*\**/
MultiLineComment: /\/\*{commentChars}?\*\// (space)
SingleLineComment: /\/\/[^\n\r\u2028\u2029]*/ (space)

# Tokens

ABSTRACT : /abstract/
ADDR : /addr/
ALIAS : /alias/
AMPERSAND : /&/
AMPERSAND_EQUAL : /&=/
AND : /and/
API : /api/
ARROW : /\->/
AS : /as/
AUTO : /auto/
AWAIT : /__await/
BASE : /base/
BOOL : /bool/
BREAK : /break/
CARET : /^/
CARET_EQUAL : /^=/
CASE : /case/
CHOICE : /choice/
CLASS : /class/
COLON : /:/
COLON_BANG : /:!/
COMMA : /,/
CONSTRAINT : /constraint/
CONTINUATION : /__continuation/
CONTINUATION_TYPE : /__Continuation/
CONTINUE : /continue/
DEFAULT : /default/
DESTRUCTOR : /destructor/
DOUBLE_ARROW : /=>/
ELSE : /else/
EQUAL : /=/
EQUAL_EQUAL : /==/
EXTENDS : /extends/
EXTERNAL : /external/
FALSE : /false/
FN : /fn/
FN_TYPE : /__Fn/
FOR : /for/
FORALL : /forall/
GREATER : />/
GREATER_EQUAL : />=/
GREATER_GREATER : />>/
GREATER_GREATER_EQUAL : />>=/
IF : /if/
IMPL : /impl/
IMPORT : /import/
IN : /in/
INTERFACE : /interface/
IS : /is/
LEFT_CURLY_BRACE : /\{/
LEFT_PARENTHESIS : /\(/
LEFT_SQUARE_BRACKET : /\[/
LESS : /</
LESS_EQUAL : /<=/
LESS_LESS : /<</
LESS_LESS_EQUAL : /<<=/
LET : /let/
LIBRARY : /library/
MATCH : /match/
MATCH_FIRST : /__match_first/
MINUS : /\-/
MINUS_EQUAL : /\-=/
MINUS_MINUS : /\-\-/
MIX : /__mix/
MIXIN : /__mixin/
NAMESPACE : /namespace/
NOT : /not/
NOT_EQUAL : /!=/
OR : /or/
PACKAGE : /package/
PERCENT : /%/
PERCENT_EQUAL : /%=/
PERIOD : /\./
PIPE : /\|/
PIPE_EQUAL : /\|=/
PLUS : /\+/
PLUS_EQUAL : /\+=/
PLUS_PLUS : /\+\+/
RETURN : /return/
RETURNED : /returned/
RIGHT_CURLY_BRACE : /\}/
RIGHT_PARENTHESIS : /\)/
RIGHT_SQUARE_BRACKET : /\]/
RUN : /__run/
SELF : /Self/
SEMICOLON : /;/
SLASH : /\//
SLASH_EQUAL : /\/=/
STAR_EQUAL : /*=/
STRING : /String/
TEMPLATE : /template/
THEN : /then/
TRUE : /true/
TYPE : /type/
UNDERSCORE : /_/
UNIMPL_EXAMPLE : /__unimplemented_example_infix/
VAR : /var/
VIRTUAL : /virtual/
WHERE : /where/
WHILE : /while/

#//====
/*
The order of rules matter for the lexer, like the two rules bellow,
otherwise "intrinsic_identifier" will never be recognized
*/
intrinsic_identifier : /Print|__intrinsic_[A-Za-z0-9_]*/
identifier : /[A-Za-z_][A-Za-z0-9_]*/ -1
#//====

sized_type_literal :  /[iuf][1-9][0-9]*/
integer_literal : /[0-9]+/
string_literal : /"(\\.|[^"\n])*\"|'''(\\.|[^'])*'''/

spaces_in : /[ \t\r\n]+/
BINARY_STAR : /\*{spaces_in}/
POSTFIX_STAR : /{spaces_in}\*{spaces_in}/
PREFIX_STAR : /{spaces_in}\*/
UNARY_STAR : /\*/

:: parser

%input input;

input :
    #END_OF_FILE
    package_directive import_directives declaration_list
    ;

package_directive :
    PACKAGE identifier optional_library_path api_or_impl SEMICOLON
    ;

import_directive :
    IMPORT identifier optional_library_path SEMICOLON
    ;

import_directives :
    /*empty*/
    | import_directives import_directive
    ;

optional_library_path :
    /*empty*/
    | LIBRARY string_literal
    ;

api_or_impl :
    API
    | IMPL
    ;

primary_expression :
    identifier
    | designator
    | PERIOD SELF
    | integer_literal
    | string_literal
    | TRUE
    | FALSE
    | sized_type_literal
    | SELF
    | STRING
    | BOOL
    | TYPE
    | CONTINUATION_TYPE
    | paren_expression
    | struct_literal
    | struct_type_literal
    | LEFT_SQUARE_BRACKET expression SEMICOLON expression RIGHT_SQUARE_BRACKET
    ;

postfix_expression :
    primary_expression
    | postfix_expression designator
    | postfix_expression ARROW identifier
    | postfix_expression PERIOD LEFT_PARENTHESIS expression RIGHT_PARENTHESIS
    | postfix_expression ARROW LEFT_PARENTHESIS expression RIGHT_PARENTHESIS
    | postfix_expression LEFT_SQUARE_BRACKET expression RIGHT_SQUARE_BRACKET
    | intrinsic_identifier tuple
    | postfix_expression tuple
    | postfix_expression POSTFIX_STAR
    | postfix_expression UNARY_STAR
    ;

ref_deref_expression :
    postfix_expression
    | PREFIX_STAR ref_deref_expression
    | UNARY_STAR ref_deref_expression
    | AMPERSAND ref_deref_expression
    ;

fn_type_expression :
    FN_TYPE tuple ARROW type_expression
    ;

type_expression :
    ref_deref_expression
    | bitwise_and_expression
    | fn_type_expression
    ;

minus_expression :
    MINUS ref_deref_expression
    ;

complement_expression :
    CARET ref_deref_expression
    ;

unary_expression :
    minus_expression
    | complement_expression
    ;

simple_binary_operand :
    ref_deref_expression
    | unary_expression
    ;

multiplicative_lhs :
    simple_binary_operand
    | multiplicative_expression
    ;

multiplicative_expression :
    multiplicative_lhs BINARY_STAR simple_binary_operand
    | multiplicative_lhs SLASH simple_binary_operand
    ;

additive_operand :
    simple_binary_operand
    | multiplicative_expression
    ;

additive_lhs :
    simple_binary_operand
    | additive_expression
    ;

additive_expression :
    multiplicative_expression
    | additive_lhs PLUS additive_operand
    | additive_lhs MINUS additive_operand
    ;

modulo_expression :
    simple_binary_operand PERCENT simple_binary_operand
    ;

bitwise_and_lhs :
    simple_binary_operand
    | bitwise_and_expression
    ;

bitwise_and_expression :
    bitwise_and_lhs AMPERSAND simple_binary_operand
    ;

bitwise_or_lhs :
    simple_binary_operand
    | bitwise_or_expression
    ;

bitwise_or_expression :
    bitwise_or_lhs PIPE simple_binary_operand
    ;

bitwise_xor_lhs :
    simple_binary_operand
    | bitwise_xor_expression
    ;

bitwise_xor_expression :
    bitwise_xor_lhs CARET simple_binary_operand
    ;

bitwise_expression :
    bitwise_and_expression
    | bitwise_or_expression
    | bitwise_xor_expression
    ;

bit_shift_expression :
    simple_binary_operand LESS_LESS simple_binary_operand
    | simple_binary_operand GREATER_GREATER simple_binary_operand
    ;

as_expression :
    simple_binary_operand AS simple_binary_operand
    ;

unimpl_expression :
    ref_deref_expression UNIMPL_EXAMPLE ref_deref_expression
    ;

value_expression :
    additive_expression
    | as_expression
    | bitwise_expression
    | bit_shift_expression
    | fn_type_expression
    | modulo_expression
    | unary_expression
    | unimpl_expression
    ;

comparison_operand :
    ref_deref_expression
    | value_expression
    ;

comparison_operator :
    EQUAL_EQUAL
    | LESS
    | LESS_EQUAL
    | GREATER
    | GREATER_EQUAL
    | NOT_EQUAL
    ;

comparison_expression :
    value_expression
    | comparison_operand comparison_operator comparison_operand
    ;

not_expression :
    NOT ref_deref_expression
    ;

predicate_expression :
    not_expression
    | comparison_expression
    ;

and_or_operand :
    ref_deref_expression
    | predicate_expression
    ;

and_lhs :
    and_or_operand
    | and_expression
    ;

and_expression :
    and_lhs AND and_or_operand
    ;

or_lhs :
    and_or_operand
    | or_expression
    ;

or_expression :
    or_lhs OR and_or_operand
    ;

where_clause :
    comparison_operand IS comparison_operand
    | comparison_operand EQUAL_EQUAL comparison_operand
    | designator EQUAL comparison_operand
    ;

where_clause_list :
    where_clause
    | where_clause_list AND where_clause
    ;

where_expression :
    type_expression WHERE where_clause_list
    ;

type_or_where_expression :
    type_expression
    | where_expression
    ;

statement_expression :
    ref_deref_expression
    | predicate_expression
    | and_expression
    | or_expression
    | where_expression
    ;

if_expression :
    statement_expression
    | IF expression THEN if_expression ELSE if_expression
    ;

expression :
    if_expression
    ;

designator :
    PERIOD identifier
    | PERIOD BASE
    ;

paren_expression :
    paren_expression_base
    ;

tuple :
    paren_expression_base
    ;

paren_expression_base :
    LEFT_PARENTHESIS RIGHT_PARENTHESIS
    | LEFT_PARENTHESIS paren_expression_contents RIGHT_PARENTHESIS
    | LEFT_PARENTHESIS paren_expression_contents COMMA RIGHT_PARENTHESIS
    ;

paren_expression_contents :
    expression
    | paren_expression_contents COMMA expression
    ;

struct_literal :
    LEFT_CURLY_BRACE RIGHT_CURLY_BRACE
    | LEFT_CURLY_BRACE struct_literal_contents RIGHT_CURLY_BRACE
    | LEFT_CURLY_BRACE struct_literal_contents COMMA RIGHT_CURLY_BRACE
    ;

struct_literal_contents :
    designator EQUAL expression
    | struct_literal_contents COMMA designator EQUAL expression
    ;

struct_type_literal :
    LEFT_CURLY_BRACE struct_type_literal_contents RIGHT_CURLY_BRACE
    | LEFT_CURLY_BRACE struct_type_literal_contents COMMA RIGHT_CURLY_BRACE
    ;

struct_type_literal_contents :
    designator COLON expression
    | struct_type_literal_contents COMMA designator COLON expression
    ;

pattern :
    non_expression_pattern
    | expression
    ;

non_expression_pattern :
    AUTO
    | binding_lhs COLON pattern
    | binding_lhs COLON_BANG expression
    | TEMPLATE binding_lhs COLON_BANG expression
    | paren_pattern
    | postfix_expression tuple_pattern
    | VAR non_expression_pattern
    ;

binding_lhs :
    identifier
    | UNDERSCORE
    ;

paren_pattern :
    paren_pattern_base
    ;

paren_pattern_base :
    LEFT_PARENTHESIS paren_pattern_contents RIGHT_PARENTHESIS
    | LEFT_PARENTHESIS paren_pattern_contents COMMA RIGHT_PARENTHESIS
    ;

paren_pattern_contents :
    non_expression_pattern
    | paren_expression_contents COMMA non_expression_pattern
    | paren_pattern_contents COMMA expression
    | paren_pattern_contents COMMA non_expression_pattern
    ;

tuple_pattern :
    paren_pattern_base
    ;

maybe_empty_tuple_pattern :
    LEFT_PARENTHESIS RIGHT_PARENTHESIS
    | tuple_pattern
    ;

clause :
    CASE pattern DOUBLE_ARROW block
    | DEFAULT DOUBLE_ARROW block
    ;

clause_list :
    /*empty*/
    | clause_list clause
    ;

statement :
    assign_statement
    | VAR pattern SEMICOLON
    | VAR pattern EQUAL expression SEMICOLON
    | RETURNED VAR variable_declaration SEMICOLON
    | RETURNED VAR variable_declaration EQUAL expression SEMICOLON
    | LET pattern EQUAL expression SEMICOLON
    | statement_expression SEMICOLON
    | if_statement
    | WHILE LEFT_PARENTHESIS expression RIGHT_PARENTHESIS block
    | BREAK SEMICOLON
    | CONTINUE SEMICOLON
    | RETURN return_expression SEMICOLON
    | RETURN VAR SEMICOLON
    | MATCH LEFT_PARENTHESIS expression RIGHT_PARENTHESIS LEFT_CURLY_BRACE clause_list RIGHT_CURLY_BRACE
    | CONTINUATION identifier block
    | RUN expression SEMICOLON
    | AWAIT SEMICOLON
    | FOR LEFT_PARENTHESIS variable_declaration IN type_expression RIGHT_PARENTHESIS block
    ;

assign_statement :
    statement_expression assign_operator expression SEMICOLON
    | PLUS_PLUS expression SEMICOLON
    | MINUS_MINUS expression SEMICOLON
    ;

assign_operator :
    EQUAL
    | PLUS_EQUAL
    | SLASH_EQUAL
    | STAR_EQUAL
    | PERCENT_EQUAL
    | MINUS_EQUAL
    | AMPERSAND_EQUAL
    | PIPE_EQUAL
    | CARET_EQUAL
    | LESS_LESS_EQUAL
    | GREATER_GREATER_EQUAL
    ;

if_statement :
    IF LEFT_PARENTHESIS expression RIGHT_PARENTHESIS block optional_else
    ;

optional_else :
    /*empty*/
    | ELSE if_statement
    | ELSE block
    ;

return_expression :
    /*empty*/
    | expression
    ;

statement_list :
    /*empty*/
    | statement_list statement
    ;

block :
    LEFT_CURLY_BRACE statement_list RIGHT_CURLY_BRACE
    ;

return_term :
    /*empty*/
    | ARROW AUTO
    | ARROW expression
    ;

generic_binding :
    identifier COLON_BANG expression
    | TEMPLATE identifier COLON_BANG expression
    ;

deduced_param :
    generic_binding
    | variable_declaration
    | ADDR variable_declaration
    ;

deduced_param_list :
    /*empty*/
    | deduced_param
    | deduced_param_list COMMA deduced_param
    ;

deduced_params :
    /*empty*/
    | LEFT_SQUARE_BRACKET deduced_param_list RIGHT_SQUARE_BRACKET
    ;

impl_deduced_params :
    /*empty*/
    | FORALL LEFT_SQUARE_BRACKET deduced_param_list RIGHT_SQUARE_BRACKET
    ;

declared_name :
    identifier
    | declared_name PERIOD identifier
    | LEFT_PARENTHESIS declared_name RIGHT_PARENTHESIS
    ;

fn_virtual_override_intro :
    FN
    | ABSTRACT FN
    | VIRTUAL FN
    | IMPL FN
    ;

function_declaration :
    fn_virtual_override_intro declared_name deduced_params maybe_empty_tuple_pattern return_term block
    | fn_virtual_override_intro declared_name deduced_params maybe_empty_tuple_pattern return_term SEMICOLON
    ;

variable_declaration :
    identifier COLON pattern
    ;

alias_declaration :
    ALIAS declared_name EQUAL expression SEMICOLON
    ;

mix_declaration :
    MIX expression SEMICOLON
    ;

alternative :
    identifier tuple
    | identifier
    ;

alternative_list :
    /*empty*/
    | alternative_list_contents
    | alternative_list_contents COMMA
    ;

alternative_list_contents :
    alternative
    | alternative_list_contents COMMA alternative
    ;

type_params :
    /*empty*/
    | tuple_pattern
    ;

mixin_import :
    /*empty*/
    | FOR expression
    ;

class_declaration_extensibility :
    /*empty*/
    | ABSTRACT
    | BASE
    ;

class_declaration_extends :
    /*empty*/
    | EXTENDS expression
    ;

declaration :
    NAMESPACE declared_name SEMICOLON
    | function_declaration
    | destructor_declaration
    | class_declaration_extensibility CLASS declared_name type_params class_declaration_extends LEFT_CURLY_BRACE class_body RIGHT_CURLY_BRACE
    | MIXIN declared_name type_params mixin_import LEFT_CURLY_BRACE mixin_body RIGHT_CURLY_BRACE
    | CHOICE declared_name type_params LEFT_CURLY_BRACE alternative_list RIGHT_CURLY_BRACE
    | VAR variable_declaration SEMICOLON
    | VAR variable_declaration EQUAL expression SEMICOLON
    | LET variable_declaration EQUAL expression SEMICOLON
    | INTERFACE declared_name type_params LEFT_CURLY_BRACE interface_body RIGHT_CURLY_BRACE
    | CONSTRAINT declared_name type_params LEFT_CURLY_BRACE interface_body RIGHT_CURLY_BRACE
    | impl_declaration
    | match_first_declaration
    | alias_declaration
    ;

impl_declaration :
    impl_kind_intro impl_deduced_params impl_type AS type_or_where_expression LEFT_CURLY_BRACE impl_body RIGHT_CURLY_BRACE
    ;

impl_kind_intro :
    IMPL
    | EXTERNAL IMPL
    ;

impl_type :
    /*empty*/
    | type_expression
    ;

match_first_declaration :
    MATCH_FIRST LEFT_CURLY_BRACE match_first_declaration_list RIGHT_CURLY_BRACE
    ;

match_first_declaration_list :
    /*empty*/
    | match_first_declaration_list impl_declaration
    ;

destructor_virtual_override_intro :
    DESTRUCTOR
    | VIRTUAL DESTRUCTOR
    | IMPL DESTRUCTOR
    ;

destructor_declaration :
    destructor_virtual_override_intro deduced_params block
    ;

declaration_list :
    /*empty*/
    | declaration_list declaration
    ;

class_body :
    /*empty*/
    | class_body declaration
    | class_body mix_declaration
    ;

mixin_body :
    /*empty*/
    | mixin_body function_declaration
    | mixin_body mix_declaration
    ;

interface_body :
    /*empty*/
    | interface_body function_declaration
    | interface_body LET generic_binding SEMICOLON
    | interface_body EXTENDS expression SEMICOLON
    | interface_body IMPL impl_type AS type_or_where_expression SEMICOLON
    ;

impl_body :
    /*empty*/
    | impl_body function_declaration
    | impl_body alias_declaration
    ;
mingodad commented 1 year ago

After posting the previous comment I found a missing escaped regex STAR_EQUAL : /*=/ that should be STAR_EQUAL : /\*=/ and now I'm getting this error message:

textmapper.sh "carbon-lang.tm"
carbon-lang.tm,136: BINARY_STAR: cannot expand {spaces_in}, not found
carbon-lang.tm,137: POSTFIX_STAR: cannot expand {spaces_in}, not found
carbon-lang.tm,138: PREFIX_STAR: cannot expand {spaces_in}, not found

Carbon-lang has a trickery in the lexer to differentiate BINARY_STAR/POSTFIX_STAR/PREFIX_STAR, how that can be accomplished with textmapper ?

mingodad commented 1 year ago

I'm working to achieve a LALR(1)/LEX to try grammars online with wasm based on https://github.com/BenHanson/gram_grep and I've got the Textmapper grammar working, view it here https://mingodad.github.io/parsertl-playground/playground/ select Textmapper parser from the examples, you can edit the Grammar or the Input source and press Parse to see a parser tree.

I hope it can be a nice tool to experiment with LALR(1)/LEX grammars with instant feedback !