leonardoaraujosantos / Matlab_LLVM_Frontend

Generate LLVM IR from matlab source code.
14 stars 2 forks source link

Start parser with our token inputs with Bison #3

Open leonardoaraujosantos opened 8 years ago

leonardoaraujosantos commented 8 years ago

Introduction

assignstatement

The parser’s job is to figure out the relationship(grammar) among the input tokens. Those tokens come from the scanner (flex). A grammar is a series of rules that the parser uses to recognize syntactically valid input.

Bison only deal with the syntax no the semantic, for instance the code bellow has a valid syntax but is invalid, bison will think that this is fine

int var = "Leo"

Consider the following grammar on bison. NAME, NUMBER, '+', '-', are tokens that flex gives to us.

statement: NAME '=' expression
expression: NUMBER '+' NUMBER
 | NUMBER '−' NUMBER

Every line here is a rule, so the rule for a statement is the pattern "NAME = expression". And the rule for expression is the pattern "NUMBER+NUMBER" OR "NUMBER-NUMBER"

Every grammar includes a start symbol, the one that has to be at the root of the parse tree. In this grammar, statement is the start symbol.

So the following source code line:

fred = 12 + 13

Would produce from the scanner the following tokens: NAME,'=',NUMBER,'+',NUMBER

This AST tree would be created: simple_expression_ast

Where "fred =" is a statement. And "12+13" is an expression.

Now what would happen if we would want to parse a longer instruction

fred = 14 + 23 − 11 + 7

To solve this can use a Recursive rule, where the rule call itself.

statement: NAME '=' expression
expression: NUMBER
 | expression '+' NUMBER
 | expression '−' NUMBER

Now the tree would grow to this: recursiverule

To start use our token inputs, starting from a simple example.

function result = justSum(a,b)
  result = a + b; # Simple comment
end

Tokens generated with out scanner

FUNCTION: function
Identifier: result
ASSIGN: =
Identifier: justSum
Open Parenthesis: (
Identifier: a
Comma: ,
Identifier: b
Close Parenthesis: )

Identifier: result
ASSIGN: =
Identifier: a
Plus: +
Identifier: b
EndLine: ;

END: end

And build an AST tree

Class diagram for parser Nodes

Considering the class diagram bellow, we need to create a rule for every class that inherites from Node. parserdiagram

References

leonardoaraujosantos commented 8 years ago

BNF

Backus Normal Form is one notation techniques for context-free grammars, often used to describe the syntax of programming languages like C, or Python

Example: bnf_0 This translates into English as:

On Bison

stmt: var_decl | func_decl | expr { /* Any Action*/ }

Read as ”stmt is a var_decl or func_decl or expr” and if this grammar match execute something on the ”action block”. Inside the action block we mount our AST Tree

if_stmt : IF '(' condition ')' block { /* do stuff when this rule is encountered */ }
        | IF '(' condition ')'       { /*do some other stuff on this other rule*/ }
        ;

This would create the grammar for the if statement, like if (condition) block, or if (condition)