vmware / database-stream-processor-compiler

Infrastructure to run programs written in high-level languages on top of the Database Stream Processor (DBSP) runtime.
Other
14 stars 2 forks source link

[RFC] DDlog 2.0 Syntax #3

Open Kixiron opened 2 years ago

Kixiron commented 2 years ago

The syntax is described in an EBNF flavor loosely based off of W3C's EBNF

Note: As of now this grammar is not complete or final, it's just what I've implemented. A lot of the constraints are much loser than they seem like they should be, that's just because of the error recovery I've implemented in the parser, things accepted within parsing are not always valid and errors are emitted

Root = Item*

Item = FunctionDef | RelationDef | TypeDef

RelationDef = Attributes Modifiers keyword:RelKw name:RelName columns:RelCols
RelKw = 'relation' | 'multiset' | 'stream'
RelName = 'ident'
RelCols = '(' columns:RelCol* ')'
RelCol = Attributes binding:Pattern ':' ty:Type ','*

FunctionDef = Attributes Modifiers keyword:'function' name:FunctionName Generics args:FunctionArgs ret:FunctionReturn? body:Block
FunctionName = 'ident'
FunctionArgs = '(' args:FunctionArg* ')'
FunctionArg = Attributes binding:Pattern ':' ty:Type ','*
FunctionReturn = ':' return_ty:Type

TypeDef = Attributes Modifiers keyword:'typedef' name:TypeName '=' body:TypeBody
TypeName = 'ident'

TypeBody = RecordType | SumType
RecordType = constructor:RecordName ('{' fields:RecordField* '}')?
RecordName = 'ident'
RecordField = Attributes name:Pattern ':' ty:Type ','*
SumType = ('|' RecordType)*

Attributes = Attribute*
Attribute = '#[' AttrPair* ']'
AttrPair = 'ident' '=' Expr ','*

Modifiers = Modifier*
Modifier = 'input' | 'output' | 'extern'

Type = GenericType | TupleType | FunctionType

GenericType = 'ident' Generics?
Generics = '<' generics:GenericArg* '>'
GenericArg = Type ','*

TupleType = '(' elements:TupleTypeElem* ')'
TupleTypeElem = Type ','*

FunctionType = 'function' args:FunctionTypeArgs ret:FunctionReturnType?
FunctionTypeArgs = '(' args:FunctionTypeArg* ')'
FunctionTypeArg = Type ','*
FunctionReturnType = ':' Type

Block = '{' statements:Stmt* '}'
Stmt =
    ExprStmt
    | VarDecl
    | IfStmt

ExprStmt = Expr ';'
VarDecl = 'var' binding:Pattern '=' value:Expr ';'

IfStmt = IfBlock* ElseBlock?
IfBlock = leading_else:'else'? 'if' cond:Expr Block
ElseBlock = 'else' Block

Pattern = VarRef | TuplePattern
TuplePattern = '(' elements:TuplePatternElem* ')'
TuplePatternElem = Pattern ','*

Expr =
    Literal
    | VarRef
    | Assign
    | ParenExpr
    | BinExpr
    | IfStmt
    | RetExpr
    | UnaryExpr
    | Block

VarRef = 'ident'

Assign = binding:Pattern '=' value:Expr

ParenExpr = '(' inner:Expr ')'

// TODO: Floats
Literal = Bool | Number | String
Bool = True | False
Number = 'number'
String = 'string'

RetExpr = 'return' expr:Expr

UnaryExpr = op:UnaryOp expr:Expr
UnaryOp = Bang | Minus

BinExpr = lhs:Expr op:BinOp rhs:Expr
BinOp =
    Plus
    | Minus
    | Star
    | Slash 
    | Percent
    | RightRocket
    | Pipe
    | Caret
    | Ampersand
    | Shl
    | Shr
    | And
    | Or
    | EqEq
    | Neq
    | RAngle
    | RAngleEq
    | LAngle
    | LAngleEq

Or = 'or'
And = 'and'
True = 'true'
False = 'false'

Eq = '='
Bang = '!'
Pipe = '|'
Plus = '+'
Star = '*'
Neq = '!='
Shl = '<<'
Shr = '>>'
Caret = '^'
Comma = ','
Colon = ':'
Minus = '-'
Slash = '/'
EqEq = '=='
LAngle = '<'
RAngle = '>'
LBrack = '['
RBrack = ']'
LCurly = '{'
RCurly = '}'
LParen = '('
RParen = ')'
Percent = '%'
Ampersand = '&'
Semicolon = ';'
LAngleEq = '<='
RAngleEq = '>='
RightRocket = '=>'
ryzhyk commented 2 years ago

Some suggestions.

Meta

Type system

mihaibudiu commented 2 years ago

bigint is useful. But it could be a library - although you lose the literals.

Kixiron commented 2 years ago

Yah, I totally agree with those sentiments. Purity is a tricky question, it's a tradeoff between user control and compiler freedom. However, since we have the ability to drop into writing and using Rust code within ddlog, I think that making the surface language pure can sidestep that particular drawback

mihaibudiu commented 2 years ago

I have no problems with mutable variables in imperative code. It should be clear that you cannot mutate any value that is in a relation, though.

ryzhyk commented 2 years ago

I have no problems with mutable variables in imperative code. It should be clear that you cannot mutate any value that is in a relation, though.

There will no longer be any distinction between imperative and relational code at the language level. Relation<> is just a type like any other. However, since there is no API to get a mutable reference to a record in a relation, its contents is immutable.

mihaibudiu commented 2 years ago

I thought you gave up on embedding the language in Rust or something else. So then we still have these two clear layers imperative/relational. Except some functions can compute on relations. Then there is no "main", and you still want input and output relations. That is also more modular - you cannot just add arguments to an existing function.

ryzhyk commented 2 years ago

Yah, I totally agree with those sentiments. Purity is a tricky question, it's a tradeoff between user control and compiler freedom. However, since we have the ability to drop into writing and using Rust code within ddlog, I think that making the surface language pure can sidestep that particular drawback

You can always convert mutable variables to single static assignment form.

I guess if we want to allow loops, then we want mutable variables. Loops are not very useful if you cannot modify variables in the body of the loop :)

mihaibudiu commented 2 years ago

Variables are not as interesting as containers. Do you want mutable vectors and maps?

ryzhyk commented 2 years ago

Variables are not as interesting as containers. Do you want mutable vectors and maps?

That's a library design question. Assuming we have mutable arguments, you can define methods that mutate the container. You can also have immutable containers, where mutator methods return a new container.

ryzhyk commented 2 years ago

I thought you gave up on embedding the language in Rust or something else.

We're not embedding in an existing language, but are designing our own functional language. Relations are first-class objects that can be passed to functions and back. All relational operators are just methods (#2); Datalog-style rules will be implemented as syntactic sugar on top of this language.

At least this is the current proposal.

Not sure exactly what main looks like in this world.

Upd: but I still like the idea that input and output relations are just inputs and outputs of a function.

Kixiron commented 2 years ago

Opened #4