gwenn / lemon-rs

LALR(1) parser generator for Rust based on Lemon + SQL parser
The Unlicense
48 stars 11 forks source link

Online yacc/lex grammar editor/tester #36

Open mingodad opened 1 year ago

mingodad commented 1 year ago

I'm trying to build an online yacc/lex (LALR(1)) grammar editor/tester to help develop/debug/document grammars the main repository is here https://github.com/mingodad/parsertl-playground and the online playground with several non trivial examples is here https://mingodad.github.io/parsertl-playground/playground/ .

Select a grammar/example from "Examples" select box and then click "Parse" to see a parser tree for the source in "Input source" editor.

It's based on https://github.com/BenHanson/gram_grep and https://github.com/BenHanson/lexertl14 .

Any feedback is welcome !

The grammars available so far (with varying state of correctness):

gwenn commented 1 year ago

For your information,

So I guess you will not be able to easily port the SQLite grammar.

mingodad commented 1 year ago

Maybe, but there is https://github.com/ricomariani/CG-SQL-author that is a superset of sqlite that you can also try online here https://mingodad.github.io/CG-SQL-Lua-playground/ and the grammar is there with this name Cql parser.

gwenn commented 1 year ago

As expected, it doesn't work:

CREATE TABLE test (view TEXT); -- view fallbacks to ID instead of keyword 

gives

code.cql:1:1: error: syntax error, unexpected VIEW
Parse errors found, no further passes will run.

And

CREATE VIRTUAL TABLE t3 using fts5(a,b,c); -- a,b,c are wildcards 

gives

code.cql:1:1: error: syntax error, unexpected ';', expecting AS
Parse errors found, no further passes will run.
mingodad commented 1 year ago

Thank you for pointing it out ! I've tested CREATE TABLE test (view TEXT); with the Postgresql parser (be patient) and it parses fine, then going back to Cql parser on line 415 :

name :
    ID
    | TEXT
    | TRIGGER
    | ROWID
    | REPLACE
    | KEY
    | VIRTUAL
    | TYPE
    | HIDDEN
    | PRIVATE
    | VIEW -- adding VIEW here to be accepted as ID
    ;

You can see that the authors of CG-CQL decided to not allow view as a valid ID but if we add it there (like showing above) then it parses fine too (this is one of the reasons I'm developing this tool https://mingodad.github.io/parsertl-playground/playground/ to allow kick experimentation/debug/development of YACC/LEX LALR(1) grammars).

In the case of CREATE VIRTUAL TABLE t3 using fts5(a,b,c); there is no support for the fts5 extension on CG-CQL so far .

mingodad commented 1 year ago

I just added a partially working sqlite3 grammar converting the original parser using the changes I made to lemon here https://github.com/mingodad/lalr-parser-test/tree/main/lemon :

mylemon -h
Valid command line options for "lemon-nb" are:
  -b           Print only the basis in report.
  -c           Don't compress the action table.
  -d<string>   Output directory.  Default '.'
  -D<string>   Define an %ifdef macro.
  -E           Print input file after preprocessing.
  -f<string>   Ignored.  (Placeholder for -f compiler options.)
  -g           Print grammar without actions.
  -y           Print yacc grammar without actions.
  -Y           Print yacc grammar without actions with full precedences.
  -z           Use yacc rule precedence
  -u           Ignore all precedences
  -I<string>   Ignored.  (Placeholder for '-I' compiler options.)
  -m           Output a makeheaders compatible file.
  -l           Do not print #line statements.
  -O<string>   Ignored.  (Placeholder for '-O' compiler options.)
  -p           Show conflicts resolved by precedence rules
  -q           (Quiet) Don't print the report file.
  -r           Do not sort or renumber states
  -s           Print parser stats to standard output.
  -S           Generate the *.sql file describing the parser tables.
  -x           Print the version number.
  -T<string>   Specify a template file.
  -W<string>   Ignored.  (Placeholder for '-W' compiler options.)

mylemon -Y parser.y

Then added the lexer part by hand.

It doesn't handle CREATE TABLE test (view TEXT); because I didn't added a rule that adds non reserved keywords to be accepted as ID, but it does handle CREATE VIRTUAL TABLE t3 using fts5(a,b,c); .

On https://mingodad.github.io/parsertl-playground/playground/ select SQLite3 parser (partially working) and play with it, any fixes (pull requests are welcome).

ricomariani commented 1 year ago

One big regret I have from building CQL is that I didn't just start from the SQLite grammar. My life would have been so much simpler...

We started from some mysql and it was good enough for us but then getting more of the grammar in place became harder and harder. Oh well, that ship has sailed.

Note that CG/SQL grammar is not a strict superset of SQLite. It's a venn diagram. For instance, CQL does not and cannot reasonably support column names that are not valid identifiers. And it's stricter in many areas. But it does support some useful sugar that isn't in the original.

ricomariani commented 1 year ago

Oh I should add, because CQL uses yacc we get LALR(1) and that means some fallbacks that SQLite could do, we can't. A few other choices were made to avoid shift/reduce conflicts.

The grammar is pretty good but I would never call it a superset. The presence of keywords is crucial for ambiguity removal and indeed if you added a lot of names as ids you could find the grammar in a bad state.

ricomariani commented 1 year ago

One other thing. CG/SQL departs significantly from SQLite on virtual tables because it can't do its job at all unless it knows the datatypes of the columns in the virtual table -- it offers strict typing. So there is totally new syntax for specifying the types of the columns as well as the module.

ricomariani commented 1 year ago

CREATE VIRTUAL TABLE t3 using fts5(a,b,c);

Could be made to work I think but you need to tell it the shape of the resulting table. CG/SQL doesn't care about virtual tables other than it needs to know what columns they have.

e.g.

CREATE VIRTUAL TABLE t3 using fts5(a,b,c) AS (a1 int, a2 text, b1 int, b2 text, b3 text, c1 int);

Note the AS portion is unique to CG/SQL -- it has whatever the combined indexed columns will be.

then you can do

select * from t3 where a2 match 'something';

mingodad commented 1 year ago

Hello @ricomariani ! Thank you so much for you great work and time helping understand the issues pointed here and the general view of CG-CQL !

ricomariani commented 1 year ago

FWIW I just added "add" and "view" to the allow list. That didn't cause any conflicts. So you could just pull again.

And you're welcome :D