mattunlv / ProcessJ2.0

ProcessJ 2.0 - a new version of the ProcessJ compiler
0 stars 0 forks source link

ProcessJ Grammar #1

Open ninjabox77 opened 1 year ago

ninjabox77 commented 1 year ago

ANTLR instead of CUP

The main key points behind using ANTLR are:

  1. Redirecting error messages
  2. Error listeners

The current implementation has a stack-based error message detection mechanism that does not interact with the lexer and parser. (I did not have enough time to finish and implement the CtxNode.) Therefore, using listeners would make changing error messages and where they go much more straightforward. ANTLR provides this mechanism as an interface; in fact, there is a syntaxError() method that applies to both the lexer and parser, which we can override to display error messages the way we want to. (It also receives a reference to the parser, which means that we can query it about the state of the lexer and parser.)

Grammar

The syntax is more similar to that of Java 11. I also added a new loop construct

for {
   ...
}

that represents an infinite loop. Also, I suggest using annotations rather than #pragma declarations to indicate when something is not native to the language. Annotations can be placed anywhere in the code, which means that every node in the AST could be annotated. Here is an example of how we could write a native ProcessJ library:

import @Static(check = out) java.lang.System.*;

public println(string s) {
    @Check out.println(s);
}

public println(int i) {
    @Check out.println(i);
}

public println(char c) {
    @Check out.println(c);
}

...

In order to type check non-native code, @Static would instruct the compiler to look through the system class for the output print stream object, so that we could gather information from its println method through Java's reflection API. We could then use this library in a ProcessJ program as follows:

import std.io;

public void main(string...args) {
    println("Hello World!");
}

Here is another example of how something non-native could be type checked. For instance, consider the following ProcessJ code:

import @Static(check = out) java.lang.System.*;

public void main(string...args) {
    @Check out.println("hello World"); 
}

We could gather information during compilation rather than letting the Java compiler (and possibly the runtime when using external types) handle it since we are using the println version of Java.

Proposed grammar

grammar ProcessJ;

init
 : packageDeclaration? importDeclaration* typeDeclaration* EOF
 ;

packageDeclaration
 : PACKAGE packageAccess SEMI
 ;

importDeclaration
 : singleImportDeclaration
 | multiImportDeclaration
 | multiImportDeclarationStar
 ;

singleImportDeclaration
 : IMPORT annotation? Identifier (DCOLON Identifier)? SEMI
 ;

multiImportDeclaration
 : IMPORT annotation? Identifier (DOT Identifier)* (DCOLON Identifier)? SEMI
 ;

multiImportDeclarationStar
 : IMPORT annotation? Identifier (DOT Identifier)*  DOT MULT SEMI
 ;

typeDeclaration
 : procedureDeclaration     # ProcedureDeclaration_
 | recordDeclaration        # RecordDeclaration_
 | protocolDeclaration      # ProtocolDeclaration_
 | constantDeclaration      # ConstantDeclaration_
 | externDeclaration        # ExternDeclaration_
 ;

procedureDeclaration
 : modifier* type_ Identifier LPAREN formalDeclarationList? RPAREN blockExpression
 ;

modifier
 : MOBILE
 | NATIVE
 | PUBLIC
 | PRIVATE
 | PROTECTED
 | CONST
 | EXTERN
 ;

recordDeclaration
 : modifier* RECORD Identifier extends? recordBody
 ;

extends
 : EXTENDS typeVariable (COMMA typeVariable)*
 ;

implements
 : IMPLEMENTS typeVariable (COMMA typeVariable)*
 ;

recordBody
 : LBRACE recordField* RBRACE
 ;

recordField
 : type_ variableDeclaratorList SEMI
 ;

protocolDeclaration
 : modifier* PROTOCOL Identifier implements? protocolBody
 ;

protocolBody
 : LBRACE protocolCase* RBRACE
 ;

protocolCase
 : Identifier COLON recordBody
 ;

constantDeclaration
 : modifier* CONST type_ variableDeclaratorList SEMI
 ;

packageAccess
 : Identifier (DOT Identifier)*
 ;

statement
 : SEMI
 | localVariableDeclarationStatement
 | expressionStatement
 ;

localVariableDeclarationStatement
 : localVariableDeclaration SEMI
 ;

localVariableDeclaration
 : variableModifier* type_ variableDeclaratorList
 ;

variableDeclaratorList
 : variableDeclarator (COMMA variableDeclarator)*
 ;

variableDeclarator
 : variableDeclaratorIdent (EQ variableInitializer)?
 ;

variableDeclaratorIdent
 : Identifier QUEST? dims?
 ;

type_
 : primitiveType
 | referenceType
 | VOID
 ;

primitiveType
 : numericType
 | BOOLEAN
 | STRING
 | BARRIER
 | TIMER
 ;

referenceType
 : arrayType
 | channelType
 | typeVariable
 ;

numericType
 : integralType
 | floatingPointType
 ;

integralType
 : BYTE
 | SHORT
 | INT
 | LONG
 | CHAR
 ;

floatingPointType
 : FLOAT
 | DOUBLE
 ;

arrayType
 : primitiveType dims
 | channelType dims
 | typeVariable dims
 ;

channelType
 : SHARED READ CHAN LT type_ GT
 | SHARED WRITE CHAN LT type_ GT
 | SHARED CHAN LT type_ GT
 | CHAN LT type_ GT
 | CHAN LT type_ GT DOT READ
 | CHAN LT type_ GT DOT WRITE
 | SHARED CHAN LT type_ GT DOT READ
 | SHARED CHAN LT type_ GT DOT WRITE
 ;

typeVariable
 : annotation* Identifier
 | annotation* packageAccess DCOLON Identifier
 | annotation* packageAccess DOT Identifier
 ;

dims
 : (LBRACK RBRACK)+
 ;

variableInitializer
 : expression
 ;

expressionStatement
 : expression SEMI?
 | expressionWithBlock SEMI?
 ;

expression
 : annotation+ expression
 | literalExpression
 | pathExpression
 | expression DOT pathExpression LPAREN actualDeclarationList? RPAREN
 | expression QUEST? DOT Identifier
 | expression DOT (READ | WRITE) LPAREN expression? RPAREN
 | expression DOT (READ | WRITE)
 | expression DOT TIMEOUT LPAREN actualDeclarationList? RPAREN
 | expression LPAREN actualDeclarationList? RPAREN
 | expression op=(DMINUS | DPLUS)
 | expression LBRACK expression RBRACK
 | expression QUEST expression COLON expression
 | op=(COMP | NOT) expression
 | expression DMULT expression
 | op=(DMINUS | DPLUS) expression
 | LPAREN (expression | primitiveType) RPAREN expression
 | expression op=(MULT | DIV | MOD) expression
 | expression op=(PLUS | MINUS) expression
 | expression op=(LSHIFT | RSHIFT | RRSHIFT) expression
 | expression AND expression
 | expression XOR expression
 | expression OR expression
 | expression comparisonOperator expression
 | expression ANDAND expression
 | expression OROR expression
 | expression EQ expression
 | expression assignmentOperator expression
 | CONTINUE Identifier?
 | BREAK Identifier?
 | RETURN expression?
 | (SKIP_ | STOP)
 | LPAREN expression RPAREN
 | LBRACE arrayElements RBRACE
 | recordExpression
 | protocolExpression
 | externalExpression
 | expressionWithBlock
 ;

annotation
 : normalAnnotation
 | markerAnnotation
 | singleElementAnnotation
 ;

normalAnnotation
 : AT Identifier LPAREN elementValuePairList RPAREN
 ;

elementValuePairList
 : elementValuePair (COMMA elementValuePair)*
 ;

elementValuePair
 : Identifier EQ elementValue
 ;

elementValue
 : literalExpression
 | annotation
 ;

markerAnnotation
 : AT Identifier
 ;

singleElementAnnotation
 : AT Identifier LPAREN elementValue RPAREN
 ;

literalExpression
 : IntegerLiteral
 | FloatingPointLiteral
 | BooleanLiteral
 | CharacterLiteral
 | StringLiteral
 | NullLiteral
 ;

pathExpression
 : DCOLON? Identifier (DCOLON Identifier)*
 ;

actualDeclarationList
 : annotation? expression (COMMA expression)*
 ;

formalDeclarationList
 : formalDeclarations (COMMA lastFormalDeclaration)?    # FormalDeclaration_
 | lastFormalDeclaration                                # LastFormalDeclaration_
 ;

formalDeclarations
 : formalDeclaration (COMMA formalDeclaration)*
 ;

formalDeclaration
 : variableModifier* type_ variableDeclarator (EQ variableInitializer)?
 ;

variableModifier
 : annotation
 | CONST
 ;

lastFormalDeclaration
 : annotation? type_ ELLIPSIS Identifier
 ;

comparisonOperator
 : EQEQ
 | NOTEQ
 | LT
 | GT
 | LTEQ
 | GTEQ
 | IS
 ;

assignmentOperator
 : EQ
 | MULTEQ
 | DIVEQ
 | MODEQ
 | PLUSEQ
 | MINUSEQ
 | LSHIFTEQ
 | RSHIFTEQ
 | RRSHIFTEQ
 | ANDEQ
 | XOREQ
 | OREQ
 ;

arrayElements
 : expression (COMMA expression)* COMMA?
 ;

expressionWithBlock
 : blockExpression
 | loopExpression
 | ifExpression
 | caseExpression
 | altExpression
 | parBlockExpression
 ;

blockExpression
 : LBRACE statements? RBRACE
 ;

statements
 : statement+ expression?
 | expression
 ;

recordExpression
 : NEW? typeVariable LBRACE recordExpressionList* RBRACE
 ;

recordExpressionList
 : annotation? DOT Identifier EQ expression COMMA?
 ;

protocolExpression
 : NEW? typeVariable LBRACE protocolExpressionList RBRACE
 ;

protocolExpressionList
 : DOT Identifier tagExpressionList (COMMA DOT Identifier tagExpressionList)*
 ;

tagExpressionList
 : COLON LBRACE recordExpressionList* RBRACE
 ;

externalExpression
 : NEW typeVariable LPAREN actualDeclarationList? RPAREN
 ;

loopExpression
 : forLoopExpression
 | whileLoopExpression
 | doWhileLoopExpression
 | infiniteLoopExpression
 ;

forLoopExpression
 : PAR? FOR LPAREN forInit? SEMI expression? SEMI forUpdate? RPAREN blockExpression
 ;

forInit
 : statementExpressionList
 | localVariableDeclaration
 ;

forUpdate
 : statementExpressionList
 ;

statementExpressionList
 : expression (COMMA expression)*
 ;

whileLoopExpression
 : WHILE LPAREN expression RPAREN blockExpression
 ;

doWhileLoopExpression
 : DO blockExpression WHILE LPAREN expression RPAREN
 ;

infiniteLoopExpression
 : FOR blockExpression
 ;

ifExpression
 : IF LPAREN expression RPAREN blockExpression (ELSE (blockExpression | ifExpression))?
 ;

caseExpression
 : SWITCH LPAREN expression RPAREN caseBlockExpression
 ;

caseBlockExpression
 : LBRACE caseBlockGroupExpression* caseLabel* RBRACE
 ;

caseBlockGroupExpression
 : caseLabel* (statements | expression)
 ;

caseLabel
 : CASE expression COLON
 | DEFAULT COLON
 ;

altExpression
 : priAltExpression
 | priAltLoopExpression
 ;

priAltExpression
 : PRI? ALT altBlodyExpression
 ;

altBlodyExpression
 : LBRACE altCase* RBRACE
 ;

altCase
 : expression ANDAND guardExpression COLON statements
 | guardExpression COLON statements
 | altExpression
 ;

guardExpression
 : expression
 | SKIP_
 ;

priAltLoopExpression
 : PRI? ALT LPAREN forInit? SEMI expression? SEMI forUpdate? RPAREN altBlodyExpression
 ;

parBlockExpression
 : PAR (ENROLL barrierExpression)? blockExpression
 ;

barrierExpression
 : expression (COMMA expression)*
 ;

externDeclaration
 : EXTERN externType Identifier SEMI
 ;

externType
 : typeVariable
 | classType
 ;

classType
 : annotation* Identifier typeArguments?
 | classType DOT annotation* Identifier typeArguments?
 ;

typeArguments
 : LT typeArgumentList GT
 ;

typeArgumentList
 : typeArgument (COMMA typeArgument)*
 ;

typeArgument
 : referenceType
 ;

// Keywords
BOOLEAN     : 'boolean';
BYTE        : 'byte';
SHORT       : 'short';
INT         : 'int';
LONG        : 'long';
FLOAT       : 'float';
DOUBLE      : 'double';
CHAR        : 'char';
STRING      : 'string';
VOID        : 'void';

CHAN        : 'chan';
READ        : 'read';
WRITE       : 'write';
SHARED      : 'shared';
CLAIM       : 'claim';

BARRIER     : 'barrier';
SYNC        : 'sync';
ENROLL      : 'enroll';

TIMER       : 'timer';
TIMEOUT     : 'timeout';

SKIP_       : 'skip';
STOP        : 'stop';
IS          : 'is';
PRAGMA      : '#pragma';
AT          : '@';

IF          : 'if';
ELSE        : 'else';
FOR         : 'for';
WHILE       : 'while';
SWITCH      : 'switch';
CASE        : 'case';
DO          : 'do';
LOOP        : 'loop';
DEFAULT     : 'default';
BREAK       : 'break' ;
CONTINUE    : 'continue' ;
RETURN      : 'return' ;

SEQ         : 'seq';
PAR         : 'par';
PRI         : 'pri';
ALT         : 'alt';
FORK        : 'fork';

NEW         : 'new';
RESUME      : 'resume';
SUSPEND     : 'suspend';
WITH        : 'with';
AS          : 'as';

PROC        : 'proc';
PROTOCOL    : 'protocol';
RECORD      : 'record';
EXTENDS     : 'extends';
IMPLEMENTS  : 'implements';

PACKAGE     : 'package';
IMPORT      : 'import';

MOBILE      : 'mobile';
NATIVE      : 'native';
PUBLIC      : 'public';
PRIVATE     : 'private';
PROTECTED   : 'protected';
CONST       : 'const';
EXTERN      : 'extern';

// Integer Literals
IntegerLiteral
 : DecimalIntegerLiteral
 | HexIntegerLiteral
 | OctalIntegerLiteral
 | BinaryIntegerLiteral
 ;

fragment
DecimalIntegerLiteral: DecimalNumeral IntegerTypeSuffix? ;

fragment
HexIntegerLiteral: HexNumeral IntegerTypeSuffix? ;

fragment
OctalIntegerLiteral: OctalNumeral IntegerTypeSuffix? ;

fragment
BinaryIntegerLiteral: BinaryNumeral IntegerTypeSuffix? ;

fragment
IntegerTypeSuffix: [lL] ;

fragment
DecimalNumeral
 : '0'
 | NonZeroDigit (Digits?
 | Underscores Digits)
 ;

fragment
Digits: Digit (DigitsAndUnderscores? Digit)? ;

fragment
Digit
 : '0'
 | NonZeroDigit
 ;

fragment
NonZeroDigit: [1-9] ;

fragment
DigitsAndUnderscores: DigitOrUnderscore+ ;

fragment
DigitOrUnderscore
 : Digit
 | '_'
 ;

fragment
Underscores: '_'+ ;

fragment
HexNumeral: '0' [xX] HexDigits ;

fragment
HexDigits: HexDigit (HexDigitsAndUnderscores? HexDigit)? ;

fragment
HexDigit: [0-9a-fA-F] ;

fragment
HexDigitsAndUnderscores: HexDigitOrUnderscore+ ;

fragment
HexDigitOrUnderscore
 : HexDigit
 | '_'
 ;

fragment
OctalNumeral
 : '0' Underscores? OctalDigits
 ;

fragment
OctalDigits
 : OctalDigit (OctalDigitsAndUnderscores? OctalDigit)?
 ;

fragment
OctalDigit
 : [0-7]
 ;

fragment
OctalDigitsAndUnderscores
 : OctalDigitOrUnderscore+
 ;

fragment
OctalDigitOrUnderscore
 : OctalDigit
 | '_'
 ;

fragment
BinaryNumeral
 : '0' [bB] BinaryDigits
 ;

fragment
BinaryDigits
 : BinaryDigit (BinaryDigitsAndUnderscores? BinaryDigit)?
 ;

fragment
BinaryDigit
 : [01]
 ;

fragment
BinaryDigitsAndUnderscores
 : BinaryDigitOrUnderscore+
 ;

fragment
BinaryDigitOrUnderscore
 : BinaryDigit
 | '_'
 ;

// Floating-Point Literals
FloatingPointLiteral
 : DecimalFloatingPointLiteral
 | HexadecimalFloatingPointLiteral
 ;

fragment
DecimalFloatingPointLiteral
 : Digits '.' Digits? ExponentPart? FloatTypeSuffix?
 | '.' Digits ExponentPart? FloatTypeSuffix?
 | Digits ExponentPart FloatTypeSuffix?
 | Digits FloatTypeSuffix
 ;

fragment
ExponentPart
 : ExponentIndicator SignedInteger
 ;

fragment
ExponentIndicator
 : [eE]
 ;

fragment
SignedInteger: Sign? Digits ;

fragment
Sign: [+-] ;

fragment
FloatTypeSuffix: [fFdD] ;

fragment
HexadecimalFloatingPointLiteral: HexSignificand BinaryExponent FloatTypeSuffix? ;

fragment
HexSignificand
 : HexNumeral '.'?
 | '0' [xX] HexDigits? '.' HexDigits
 ;

fragment
BinaryExponent: BinaryExponentIndicator SignedInteger ;

fragment
BinaryExponentIndicator: [pP] ;

// Boolean Literals
BooleanLiteral
 : 'true'
 | 'false'
 ;

// Character Literals
CharacterLiteral
 : '\'' SingleCharacter '\''
 | '\'' EscapeSequence '\''
 ;

fragment
SingleCharacter: ~['\\\r\n] ;

// String Literals
StringLiteral: '"' StringCharacters? '"' ;

fragment
StringCharacters: StringCharacter+ ;

fragment
StringCharacter
 : ~["\\\r\n]
 | EscapeSequence
 ;

// Escape Sequences for Character and String Literals
fragment
EscapeSequence
 : '\\' [btnfr"'\\]
 | OctalEscape
 | UnicodeEscape // This is not in the spec but prevents having to preprocess the input
 ;

fragment
OctalEscape
 : '\\' OctalDigit
 | '\\' OctalDigit OctalDigit
 | '\\' ZeroToThree OctalDigit OctalDigit
 ;

fragment
ZeroToThree: [0-3] ;

// This is not in the spec but prevents having to preprocess the input
fragment
UnicodeEscape: '\\' 'u'+ HexDigit HexDigit HexDigit HexDigit ;

// Separators
LPAREN      : '(';
RPAREN      : ')' ;
LBRACE      : '{';
RBRACE      : '}' ;
LBRACK      : '[';
RBRACK      : ']' ;

SEMI        : ';';
COMMA       : ',';

QUEST       : '?';
DCOLON      : '::';
COLON       : ':';
DOT         : '.';
ELLIPSIS    : '...';

// Operators
EQ          : '=';
MULTEQ      : '*=';
DIVEQ       : '/=';
MODEQ       : '%=';
PLUSEQ      : '+=';
MINUSEQ     : '-=';
LSHIFTEQ    : '<<=';
RSHIFTEQ    : '>>=';
RRSHIFTEQ   : '>>>=';
ANDEQ       : '&=';
XOREQ       : '^=';
OREQ        : '|=';
LARROW      : '<-';
RARROW      : '->';

GT          : '>';
LT          : '<';
EQEQ        : '==';
LTEQ        : '<=';
GTEQ        : '>=';
NOTEQ       : '!=';

LSHIFT      : '<<';
RSHIFT      : '>>';
RRSHIFT     : '>>>';
ANDAND      : '&&';
OROR        : '||';
PLUS        : '+';
MINUS       : '-';
MULT        : '*';
DIV         : '/';
AND         : '&';
OR          : '|';
XOR         : '^';
MOD         : '%';

NOT         : '!';
COMP        : '~';
DPLUS       : '++';
DMINUS      : '--';
DMULT       : '**';

// Null Literal
NullLiteral: 'null' ;

// Identifiers (must appear after all keywords in the grammar)
Identifier: JavaLetter JavaLetterOrDigit* ;

fragment
JavaLetter
 :  [a-zA-Z$_] // these are the "java letters" below 0x7F
 | // covers all characters above 0x7F which are not a surrogate
   ~[\u0000-\u007F\uD800-\uDBFF]
   { Character.isJavaIdentifierStart(_input.LA(-1)) }?
 | // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
   [\uD800-\uDBFF] [\uDC00-\uDFFF]
   { Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1))) }?
 ;

fragment
JavaLetterOrDigit
 :  [a-zA-Z0-9$_] // these are the "java letters or digits" below 0x7F
 | // covers all characters above 0x7F which are not a surrogate
   ~[\u0000-\u007F\uD800-\uDBFF]
   { Character.isJavaIdentifierPart(_input.LA(-1)) }?
 | // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
   [\uD800-\uDBFF] [\uDC00-\uDFFF]
   { Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1))) }?
 ;

fragment
ShCommand:  ~[\r\n\uFFFF]* ;

// Whitespace and comments
WS:  [ \t\r\n\u000C]+ -> skip ;
COMMENT:   '/*' .*? '*/' -> channel(HIDDEN) ;
LINE_COMMENT:   '//' ~[\r\n]* -> channel(HIDDEN) ;
mattunlv commented 1 year ago

I understand the error issue. I like that.

remove the loop; I don't like that AT ALL!

How do you know that the grammar is correct?