souffle-lang / souffle

Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.
http://souffle-lang.github.io/
Universal Permissive License v1.0
921 stars 208 forks source link

Grammar railroad diagram #1970

Closed mingodad closed 3 years ago

mingodad commented 3 years ago

Using this tool https://www.bottlecaps.de/convert/ (removing extra space in : PRAGMA STRING[key ]) and manually adding the tokens from souffle/blob/master/src/parser/scanner.ll we can see a railroad diagram for the grammar in souffle/blob/master/src/parser/parser.yy copy and paste the EBNF shown bellow on https://www.bottlecaps.de/rr/ui in the tab Edit Grammar then switching to the tab View Diagram.

/* converted on Thu Nov 3, 2022, 12:01 (UTC+01) by bison-to-w3c v0.61 which is Copyright (c) 2011-2022 by Gunther Rademacher <grd@gmx.net> */

program  ::= ( directive_head | rule | fact | component_decl | component_init | pragma | type_decl | functor_decl | relation_decl )*
qualified_name
         ::= IDENT ( DOT IDENT )*
type_decl
         ::= TYPE IDENT ( SUBTYPE qualified_name | EQUALS ( qualified_name ( PIPE qualified_name )* | record_type_list | adt_branch ( PIPE adt_branch )* ) )?
           | ( NUMBER_TYPE | SYMBOL_TYPE ) IDENT
record_type_list
         ::= LBRACKET non_empty_attributes? RBRACKET
adt_branch
         ::= IDENT LBRACE non_empty_attributes? RBRACE
relation_decl
         ::= DECL IDENT ( COMMA IDENT )* attributes_list ( OVERRIDABLE_QUALIFIER | INLINE_QUALIFIER | NO_INLINE_QUALIFIER | MAGIC_QUALIFIER | NO_MAGIC_QUALIFIER | BRIE_QUALIFIER | BTREE_QUALIFIER | BTREE_DELETE_QUALIFIER | EQREL_QUALIFIER | OUTPUT_QUALIFIER | INPUT_QUALIFIER | PRINTSIZE_QUALIFIER )* dependency_list
attributes_list
         ::= LPAREN non_empty_attributes? RPAREN
non_empty_attributes
         ::= attribute ( COMMA attribute )*
attribute
         ::= IDENT COLON qualified_name
dependency
         ::= IDENT
           | LPAREN IDENT ( COMMA IDENT )* RPAREN
dependency_list
         ::= ( CHOICEDOMAIN dependency ( COMMA dependency )* )?
fact     ::= atom DOT
rule     ::= ( rule_def | atom LE atom IF body DOT ) query_plan?
rule_def ::= atom ( COMMA atom )* IF body DOT
body     ::= disjunction
disjunction
         ::= conjunction ( SEMICOLON conjunction )*
conjunction
         ::= term ( COMMA term )*
term     ::= EXCLAMATION* ( atom | constraint | LPAREN disjunction RPAREN )
atom     ::= qualified_name LPAREN arg_list RPAREN
constraint
         ::= arg ( LT | GT | LE | GE | EQUALS | NE ) arg
           | ( TMATCH | TCONTAINS ) LPAREN arg COMMA arg RPAREN
           | TRUELIT
           | FALSELIT
arg_list ::= non_empty_arg_list?
non_empty_arg_list
         ::= arg ( COMMA arg )*
arg      ::= STRING
           | FLOAT
           | UNSIGNED
           | NUMBER
           | UNDERSCORE
           | DOLLAR ( qualified_name LPAREN arg_list RPAREN )?
           | ( AUTOINC LPAREN | LPAREN arg | AS LPAREN arg COMMA qualified_name | functor_built_in LPAREN arg_list ) RPAREN
           | IDENT
           | NIL
           | LBRACKET arg_list RBRACKET
           | AT ( IDENT LPAREN arg_list RPAREN | AT IDENT arg_list COLON arg COMMA aggregate_body )
           | aggregate_func ( LPAREN arg COMMA non_empty_arg_list RPAREN | arg_list COLON aggregate_body )
           | ( MINUS | BW_NOT | L_NOT | arg ( PLUS | MINUS | STAR | SLASH | PERCENT | CARET | L_AND | L_OR | L_XOR | BW_AND | BW_OR | BW_XOR | BW_SHIFT_L | BW_SHIFT_R | BW_SHIFT_R_UNSIGNED ) ) arg
functor_built_in
         ::= CAT
           | ORD
           | RANGE
           | STRLEN
           | SUBSTR
           | TOFLOAT
           | TONUMBER
           | TOSTRING
           | TOUNSIGNED
aggregate_func
         ::= COUNT
           | MAX
           | MEAN
           | MIN
           | SUM
aggregate_body
         ::= LBRACE body RBRACE
           | atom
query_plan
         ::= PLAN NUMBER COLON plan_order ( COMMA NUMBER COLON plan_order )*
plan_order
         ::= LPAREN ( NUMBER ( COMMA NUMBER )* )? RPAREN
component_decl
         ::= COMPONENT component_type ( ( COLON | COMMA ) component_type )* LBRACE ( directive_head | rule | fact | OVERRIDE IDENT | component_init | component_decl | type_decl | relation_decl )* RBRACE
component_type
         ::= IDENT component_type_params
component_type_params
         ::= ( LT IDENT ( COMMA IDENT )* GT )?
component_init
         ::= INSTANTIATE IDENT EQUALS component_type
functor_decl
         ::= FUNCTOR IDENT LPAREN functor_arg_type_list RPAREN COLON qualified_name STATEFUL?
functor_arg_type_list
         ::= ( functor_attribute ( COMMA functor_attribute )* )?
functor_attribute
         ::= ( IDENT COLON )? qualified_name
pragma   ::= PRAGMA STRING STRING?
directive_head
         ::= directive_head_decl directive_list
directive_head_decl
         ::= INPUT_DECL
           | OUTPUT_DECL
           | PRINTSIZE_DECL
           | LIMITSIZE_DECL
directive_list
         ::= relation_directive_list ( LPAREN ( IDENT EQUALS kvp_value ( COMMA IDENT EQUALS kvp_value )* )? RPAREN )?
relation_directive_list
         ::= qualified_name ( COMMA qualified_name )*
kvp_value
         ::= STRING
           | IDENT
           | NUMBER
           | TRUELIT
           | FALSELIT

// Tokens from https://github.com/souffle-lang/souffle/blob/master/src/parser/scanner.ll

DECL ::= ".decl"
FUNCTOR ::= ".functor"
INPUT_DECL ::= ".input"
OUTPUT_DECL ::= ".output"
PRINTSIZE_DECL ::= ".printsize"
LIMITSIZE_DECL ::= ".limitsize"
TYPE ::= ".type"
COMPONENT ::= ".comp"
INSTANTIATE ::= ".init"
NUMBER_TYPE ::= ".number_type"
SYMBOL_TYPE ::= ".symbol_type"
OVERRIDE ::= ".override"
PRAGMA ::= ".pragma"
PLAN ::= ".plan"
INCLUDE ::= ".include"
ONCE ::= ".once"
AUTOINC ::= "autoinc"
BW_AND ::= "band"
BW_OR ::= "bor"
BW_XOR ::= "bxor"
BW_NOT ::= "bnot"
BW_SHIFT_L ::= "bshl"
BW_SHIFT_R ::= "bshr"
BW_SHIFT_R_UNSIGNED ::= "bshru"
L_AND ::= "land"
L_OR ::= "lor"
L_XOR ::= "lxor"
L_NOT ::= "lnot"
TMATCH ::= "match"
MEAN ::= "mean"
CAT ::= "cat"
ORD ::= "ord"
FOLD ::= "fold"
RANGE ::= "range"
STRLEN ::= "strlen"
SUBSTR ::= "substr"
STATEFUL ::= "stateful"
TCONTAINS ::= "contains"
OUTPUT_QUALIFIER ::= "output"
INPUT_QUALIFIER ::= "input"
OVERRIDABLE_QUALIFIER ::= "overridable"
PRINTSIZE_QUALIFIER ::= "printsize"
EQREL_QUALIFIER ::= "eqrel"
INLINE_QUALIFIER ::= "inline"
NO_INLINE_QUALIFIER ::= "no_inline"
MAGIC_QUALIFIER ::= "magic"
NO_MAGIC_QUALIFIER ::= "no_magic"
BRIE_QUALIFIER ::= "brie"
BTREE_DELETE_QUALIFIER ::= "btree_delete"
BTREE_QUALIFIER ::= "btree"
MIN ::= "min"
MAX ::= "max"
AS ::= "as"
NIL ::= "nil"
UNDERSCORE ::= "_"
COUNT ::= "count"
SUM ::= "sum"
TRUE ::= "true"
FALSE ::= "false"
TOFLOAT ::= "to_float"
TONUMBER ::= "to_number"
TOSTRING ::= "to_string"
TOUNSIGNED ::= "to_unsigned"
CHOICEDOMAIN ::= "choice-domain"
PIPE ::= "|"
LBRACKET ::= "["
RBRACKET ::= "]"
DOLLAR ::= "$"
PLUS ::= "+"
MINUS ::= "-"
LPAREN ::= "("
RPAREN ::= ")"
COMMA ::= ","
COLON ::= ":"
SEMICOLON ::= ";"
DOT ::= "."
SUBTYPE ::= "<:"
LE ::= "<="
GE ::= ">="
NE ::= "!="
EQUALS ::= "="
EXCLAMATION ::= "!"
STAR ::= "*"
AT ::= "@"
SLASH ::= "/"
CARET ::= "^"
PERCENT ::= "%"
LBRACE ::= "{"
RBRACE ::= "}"
LT ::= "<"
GT ::= ">"
IF ::= ":-"

OBS: Edited on 2022-11-03 to the parser/lexer as today.

b-scholz commented 3 years ago

Thanks for this - this is amazing. Is there a way to integrate this in our documentation?

mingodad commented 3 years ago

On https://www.bottlecaps.de/rr/ui where we are viewing the diagram there is a buttom Download and we get a self contained navigable .xhtm file.

b-scholz commented 3 years ago

Let's upload it on souffle-lang.github.io and refer to it in the documentation. Would you like to do this?

b-scholz commented 3 years ago

I uploaded the xhtml and linked it here: https://github.com/souffle-lang/souffle-lang.github.io/blob/master/pages/docs/datalog.md

It would be great to provide more information related to railroad grammer/ebnf so that people understand the produced grammar better.

mingodad commented 3 years ago

One purpose of the railroad diagram is to have a global view of the grammar and also to debug, the grammar railroad generator does some simplifications on the grammar that sometimes is not obvious for the writer and after seeing then can lead to some cleanup in the original grammar.

On wikipedia we have this explanations https://en.wikipedia.org/wiki/Syntax_diagram

b-scholz commented 3 years ago

I removed the syntactical elements that are deprecated and refined the syntax:

program  ::= (   pragma  |functor_decl  | component_decl | component_init |  directive |   rule  | fact | relation_decl | type_decl )*

type_name ::= IDENT ( DOT IDENT )* | "unsigned" | "number" | "float" | "symbol" 

type_decl ::= TYPE IDENT (SUBTYPE type_name | EQUALS ( type_name ( PIPE type_name )* | record_list | adt_branch ( PIPE adt_branch )* ))

adt_branch
         ::= IDENT LBRACE (attribute ( COMMA attribute )*)? RBRACE

record_list
         ::= LBRACKET (attribute ( COMMA attribute )*)? RBRACKET

attribute
         ::= IDENT COLON type_name

functor_decl
         ::= FUNCTOR IDENT LPAREN (attribute ( COMMA attribute )*)? RPAREN COLON type_name STATEFUL?

qualified_name ::= IDENT ( DOT IDENT )*

relation_decl
         ::= DECL IDENT ( COMMA IDENT )* LPAREN attribute ( COMMA attribute )* RPAREN ( OVERRIDABLE_QUALIFIER | INLINE_QUALIFIER | MAGIC_QUALIFIER | BRIE_QUALIFIER | BTREE_QUALIFIER | EQREL_QUALIFIER )* choice_domain

choice_domain
         ::= ( CHOICEDOMAIN ( IDENT
           | LPAREN IDENT ( COMMA IDENT )* RPAREN )  ( COMMA ( IDENT
           | LPAREN IDENT ( COMMA IDENT )* RPAREN ) )* )?
fact     ::= atom DOT
rule     ::= atom ( COMMA atom )* IF disjunction DOT query_plan?
disjunction ::= conjunction ( SEMICOLON conjunction )*
conjunction ::= term ( COMMA term )*  
query_plan
         ::= PLAN NUMBER COLON LPAREN ( NUMBER ( COMMA NUMBER )* )? RPAREN ( COMMA NUMBER COLON LPAREN ( NUMBER ( COMMA NUMBER )* )? RPAREN )*
term     ::= EXCLAMATION* ( atom | constraint | LPAREN disjunction RPAREN )
atom     ::= qualified_name LPAREN arg_list RPAREN
constraint
         ::= arg ( LT | GT | LE | GE | EQUALS | NE ) arg
           | TMATCH LPAREN arg COMMA arg RPAREN
           | TCONTAINS LPAREN arg COMMA arg RPAREN
           | TRUE
           | FALSE

arg_list ::= ( arg ( COMMA arg )* )?

arg      ::= STRING
           | FLOAT
           | UNSIGNED
           | NUMBER
           | UNDERSCORE
           | DOLLAR ( IDENT ( LPAREN arg_list RPAREN )? )?
           | IDENT
           | NIL
           | LBRACKET arg_list RBRACKET
           | ( LPAREN arg | AS LPAREN arg COMMA type_name | ( AT IDENT | functor_built_in ) LPAREN arg_list ) RPAREN
           | aggregate_func ( LPAREN arg COMMA arg ( COMMA arg )* RPAREN | arg_list COLON aggregate_body )
           | ( MINUS | BW_NOT | L_NOT | arg ( PLUS | MINUS | STAR | SLASH | PERCENT | CARET | L_AND | L_OR | L_XOR | BW_AND | BW_OR | BW_XOR | BW_SHIFT_L | BW_SHIFT_R | BW_SHIFT_R_UNSIGNED ) ) arg
functor_built_in
         ::= CAT
           | ORD
           | RANGE
           | STRLEN
           | SUBSTR
           | TOFLOAT
           | TONUMBER
           | TOSTRING
           | TOUNSIGNED
aggregate_func
         ::= COUNT
           | MAX
           | MEAN
           | MIN
           | SUM
aggregate_body
         ::= LBRACE body RBRACE
           | atom
component_decl
         ::= COMPONENT component_type ( ( COLON | COMMA ) component_type )* LBRACE ( directive | rule | fact | OVERRIDE IDENT | component_init | component_decl | type_decl | relation_decl )* RBRACE
component_type
         ::= IDENT component_params
component_params
         ::= ( LT IDENT ( COMMA IDENT )* GT )?
component_init
         ::= INSTANTIATE IDENT EQUALS component_type

pragma   ::= PRAGMA STRING STRING?
directive
         ::= directive_qualifier qualified_name ( COMMA qualified_name )* ( LPAREN ( IDENT EQUALS directive_value ( COMMA IDENT EQUALS directive_value )* )? RPAREN )?
directive_qualifier
         ::= INPUT_DECL
           | OUTPUT_DECL
           | PRINTSIZE_DECL
           | LIMITSIZE_DECL
directive_value
         ::= STRING
           | IDENT
           | NUMBER
           | TRUE
           | FALSE

// Tokens from https://github.com/souffle-lang/souffle/blob/master/src/parser/scanner.ll

DECL ::= ".decl"
FUNCTOR ::= ".functor"
INPUT_DECL ::= ".input"
OUTPUT_DECL ::= ".output"
PRINTSIZE_DECL ::= ".printsize"
LIMITSIZE_DECL ::= ".limitsize"
TYPE ::= ".type"
COMPONENT ::= ".comp"
INSTANTIATE ::= ".init"
OVERRIDE ::= ".override"
PRAGMA ::= ".pragma"
PLAN ::= ".plan"
BW_AND ::= "band"
BW_OR ::= "bor"
BW_XOR ::= "bxor"
BW_NOT ::= "bnot"
BW_SHIFT_L ::= "bshl"
BW_SHIFT_R ::= "bshr"
BW_SHIFT_R_UNSIGNED ::= "bshru"
L_AND ::= "land"
L_OR ::= "lor"
L_XOR ::= "lxor"
L_NOT ::= "lnot"
TMATCH ::= "match"
MEAN ::= "mean"
CAT ::= "cat"
ORD ::= "ord"
RANGE ::= "range"
STRLEN ::= "strlen"
SUBSTR ::= "substr"
STATEFUL ::= "stateful"
TCONTAINS ::= "contains"
PRINTSIZE_QUALIFIER ::= "printsize"
EQREL_QUALIFIER ::= "eqrel"
INLINE_QUALIFIER ::= "inline"
MAGIC_QUALIFIER ::= "magic"
BRIE_QUALIFIER ::= "brie"
BTREE_QUALIFIER ::= "btree"
OVERRIDABLE_QUALIFIER ::= "override"
MIN ::= "min"
MAX ::= "max"
AS ::= "as"
NIL ::= "nil"
UNDERSCORE ::= "_"
COUNT ::= "count"
SUM ::= "sum"
TRUE ::= "true"
FALSE ::= "false"
TOFLOAT ::= "to_float"
TONUMBER ::= "to_number"
TOSTRING ::= "to_string"
TOUNSIGNED ::= "to_unsigned"
CHOICEDOMAIN ::= "choice-domain"
PIPE ::= "|"
LBRACKET ::= "["
RBRACKET ::= "]"
DOLLAR ::= "$"
PLUS ::= "+"
MINUS ::= "-"
LPAREN ::= "("
RPAREN ::= ")"
COMMA ::= ","
COLON ::= ":"
SEMICOLON ::= ";"
DOT ::= "."
SUBTYPE ::= "<:"
LE ::= "<="
GE ::= ">="
NE ::= "!="
EQUALS ::= "="
EXCLAMATION ::= "!"
STAR ::= "*"
AT ::= "@"
SLASH ::= "/"
CARET ::= "^"
PERCENT ::= "%"
LBRACE ::= "{"
RBRACE ::= "}"
LT ::= "<"
GT ::= ">"
IF ::= ":-"
b-scholz commented 3 years ago

The documentation contains now rail-road grammars. Have a look: http://souffle-lang.github.io

It was time consuming extracting the SVG images from the XHTML files but it works.

Unfortunately, I couldn't find a tool that converts XHTML directly to MD and SVG files in a nice way.

mingodad commented 3 years ago

I think that would be nice to also have a link for the full grammar, because the user have a global view and can navigate through all of it easily.

mingodad commented 3 years ago

Also for partial views if you download the railroad generator for offline usage https://www.bottlecaps.de/rr/download/rr-1.63-java8.zip you could generate only what you need to make easy to add the pieces, even in the online tool if you add the piece bellow you'll get the railroad for it.

program  ::= (   pragma  |functor_decl  | component_decl | component_init |  directive |   rule  | fact | relation_decl | type_decl )*
b-scholz commented 3 years ago

Let's wait for the full view because the grammar still needs to be changed. The automatic tool did not fully capture all semantic details and that needs to be adjusted. I would prefer making further tweaks so that it is more digestible for the users of Souffle.

For example, the Argument non-terminal is still hard to comprehend. There I need to change more. Also the legacy syntax was reflected - I had to remove all this.

Thanks for the pointer to the download zip. Can it directly generate SVG?

mingodad commented 3 years ago

No it doesn't generate only the svg but if you feed it one rule at a time there will be only one svg on the generated xhtml and I think that would make easier to extract it.

b-scholz commented 3 years ago

All good - most of the work is done. Thanks again for pointing this out.

May I ask for what are you using Souffle?

mingodad commented 3 years ago

Right now I'm not using souffle I'm more interested in grammars and searched github for project using parser generators with grammars, I normally use sqlite but I'm intrigued by Prolog and Datalog and souffle seems an interesting project in this space and while trying to generate parsers with https://github.com/mingodad/CocoR-CPP generally several conflicts surface and I'm thinking that maybe if I could somehow describe the grammar in a way that a solver/prolog/datalog could find a path without or with minimal conflicts that would be nice, but I didn't found the way to model grammars in way to feed to solvers/prolog/datalog to achieve this.

b-scholz commented 3 years ago

Parsing in logic might not be so sufficient. I need to think about this.

mingodad commented 3 years ago

Meanwhile also another functionality I'm looking for is text generation from grammars like https://github.com/dmbaturin/bnfgen/issues/2 and it seems that prolog could be used to generate text from pseudo grammars.

mingodad commented 3 years ago

Looking through the souffle documentation I can see that it has a sqlite3 driver for input/output but not much detail about how to use it only one minimal example tests/semantic/store3/store3.dl.

Maybe this could be used by you to play with any possible work with grammars from souffle and there is a few projects using lemon.

For example sqlite3 parser is generated through lemon that if we pass the -S command line option will generate sql representing the grammar:

./lemon -S parser.y

Output:

BEGIN;
CREATE TABLE symbol(
  id INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
  isTerminal BOOLEAN NOT NULL,
  fallback INTEGER REFERENCES symbol DEFERRABLE INITIALLY DEFERRED
);
INSERT INTO symbol(id,name,isTerminal,fallback)VALUES(0,'$',TRUE,NULL);
INSERT INTO symbol(id,name,isTerminal,fallback)VALUES(1,'SEMI',TRUE,NULL);
INSERT INTO symbol(id,name,isTerminal,fallback)VALUES(2,'EXPLAIN',TRUE,59);
INSERT INTO symbol(id,name,isTerminal,fallback)VALUES(3,'QUERY',TRUE,59);
...
CREATE TABLE rule(
  ruleid INTEGER PRIMARY KEY,
  lhs INTEGER REFERENCES symbol(id),
  txt TEXT
);
CREATE TABLE rulerhs(
  ruleid INTEGER REFERENCES rule(ruleid),
  pos INTEGER,
  sym INTEGER REFERENCES symbol(id)
);
INSERT INTO rule(ruleid,lhs,txt)VALUES(0,188,'explain ::= EXPLAIN');
INSERT INTO rulerhs(ruleid,pos,sym)VALUES(0,0,2);
INSERT INTO rule(ruleid,lhs,txt)VALUES(1,188,'explain ::= EXPLAIN QUERY PLAN');
INSERT INTO rulerhs(ruleid,pos,sym)VALUES(1,0,2);
INSERT INTO rulerhs(ruleid,pos,sym)VALUES(1,1,3);
INSERT INTO rulerhs(ruleid,pos,sym)VALUES(1,2,4);
...
COMMIT;

And I created a query to generate EBNF for railroad diagram from the generated SQL:

select name || ' ::=' || group_concat(
    case when hasOr == 1 then ' (' || trim(rtxt) || ')'
    else
        case when length(rtxt) == 0 then ' /* empty */' else rtxt end
    end, '
    |')
from (
    select lhs, name, substr(txt, instr(txt, tbl.sep) + length(tbl.sep)) rtxt, (instr(txt, '|') > 0) hasOr
    from rule left join symbol on lhs=id, (select '::=' as sep) tbl
) as t
group by lhs;

Partial EBNF output:

input ::= cmdlist
cmdlist ::= cmdlist ecmd
    | ecmd
ecmd ::= SEMI
    | cmdx SEMI
    | explain cmdx SEMI
cmdx ::= cmd
explain ::= EXPLAIN
    | EXPLAIN QUERY PLAN
cmd ::= BEGIN transtype trans_opt
    | (COMMIT|END trans_opt)
    | ROLLBACK trans_opt
    | SAVEPOINT nm
b-scholz commented 3 years ago

That is very nice. Souffle uses sqlite3 only for loading and storing facts. See here:

http://github.com/souffle-lang/directives

mingodad commented 1 year ago

I'm working to achieve a LALR(1)/LEX to try grammars online with wasm based on https://github.com/BenHanson/gram_grep and I've got the Souffle grammar working, view it here https://mingodad.github.io/parsertl-playground/playground/ select Souffle parser from the examples, you can edit the Grammar or the Input source and press Parse to see a parser tree.

I hope it can be a nice tool to experiment with LALR(1)/LEX grammars with instant feedback !