shnewto / bnf

Parse BNF grammar definitions
MIT License
252 stars 22 forks source link

How to use the bnf lib in sql parse function #135

Open zerix opened 1 year ago

zerix commented 1 year ago

I want to write a custom VQL parser use nom. And I found it have not contain grammer define function.

Can I use bnf lib to define VQL/SQL grammar like antlr4 g4 file?

I can't find some example for this, I think the process is:

  1. define VQL/SQL grammar file, like vql.bnf.
  2. use bnf lib parse vql.bnf to parse tree.
  3. traverse the parse tree use Grammar::traverse or earley::Traversal and output AST?
  4. parse the AST and get the VQL/SQL stmt define.
  5. stmt mod output VQL/SQL logical plan.

I didn't know the process is right or not? Anyone could help me the right way?

Thanks.

CrockAgile commented 1 year ago

That process sounds right to me! I am not familiar with VQL, but if we could get a vql.bnf, then we could add this as an example. Thank you for asking!

shnewto commented 1 year ago

I agree with @CrockAgile except to note that Grammar::traverse isn't what you're looking for, thats just a private function used to generate random sentences from a grammar 😄

With an ANTLR shaped use case, I'd vote the "Parse Sentence" example in the Readme would be a good place to start https://github.com/shnewto/bnf#parse-sentence-via-grammar (it only prints the trees you'd need to walk through)

zerix commented 1 year ago

Thank you very much for your reply. @shnewto I have read and test the "Parse Sentence" example, and i got a sql parse tree. But I found it difficult to translate it into Abstract Syntax Tree (AST) , and it may be a commonly used function. so I think if the bnf project should provide it ?

@CrockAgile I write a simple vql.bnf file, and I will try to write walk through code with the bnf file. If you have some time, very glad to read your example code. (I 'am a newbie in the rust and interpreter field, I 'am translating the example from python to rust, and i will begin writing bnf example code once it is done. https://ruslanspivak.com/lsbasi-part1/)

vql.bnf.txt

kiwilab-cn commented 11 months ago

@shnewto Did bnf lib support '[..]' or ' { }' grammar define?

eg:

<jdbc_uri> ::= <mysql> <colon> <double_slash> <host> <colon> <port> <colon> [<database_name>]

<jdbc_connection_info> ::= <jdbc_connection_required> { <jdbc_connection_options> }

shnewto commented 11 months ago

@kiwilab-cn no, we don't currently support [] or {} (I would like to support some variants like EBNF eventually though)

CrockAgile commented 11 months ago

I started experimenting with a SQL example on a branch here.

It's not really "full SQL" (that grammar would be large) but it parses the input "CREATE VIEW employed_names AS SELECT e.first_name FROM Employees AS e" to this parse tree:

flowchart TD
0["view_definition"] --> 1["CREATE VIEW"]
0["view_definition"] --> 2["ws"]
2["ws"] --> 3[" "]
0["view_definition"] --> 4["identifier"]
4["identifier"] --> 5["letter"]
5["letter"] --> 6["e"]
4["identifier"] --> 7["identifier_rest"]
7["identifier_rest"] --> 8["letter"]
8["letter"] --> 9["m"]
7["identifier_rest"] --> 10["identifier_rest"]
10["identifier_rest"] --> 11["letter"]
11["letter"] --> 12["p"]
10["identifier_rest"] --> 13["identifier_rest"]
13["identifier_rest"] --> 14["letter"]
14["letter"] --> 15["l"]
13["identifier_rest"] --> 16["identifier_rest"]
16["identifier_rest"] --> 17["letter"]
17["letter"] --> 18["o"]
16["identifier_rest"] --> 19["identifier_rest"]
19["identifier_rest"] --> 20["letter"]
20["letter"] --> 21["y"]
19["identifier_rest"] --> 22["identifier_rest"]
22["identifier_rest"] --> 23["letter"]
23["letter"] --> 24["e"]
22["identifier_rest"] --> 25["identifier_rest"]
25["identifier_rest"] --> 26["letter"]
26["letter"] --> 27["d"]
25["identifier_rest"] --> 28["identifier_rest"]
28["identifier_rest"] --> 29["_"]
28["identifier_rest"] --> 30["identifier_rest"]
30["identifier_rest"] --> 31["letter"]
31["letter"] --> 32["n"]
30["identifier_rest"] --> 33["identifier_rest"]
33["identifier_rest"] --> 34["letter"]
34["letter"] --> 35["a"]
33["identifier_rest"] --> 36["identifier_rest"]
36["identifier_rest"] --> 37["letter"]
37["letter"] --> 38["m"]
36["identifier_rest"] --> 39["identifier_rest"]
39["identifier_rest"] --> 40["letter"]
40["letter"] --> 41["e"]
39["identifier_rest"] --> 42["identifier_rest"]
42["identifier_rest"] --> 43["letter"]
43["letter"] --> 44["s"]
0["view_definition"] --> 45["ws"]
45["ws"] --> 46[" "]
0["view_definition"] --> 47["AS"]
0["view_definition"] --> 48["ws"]
48["ws"] --> 49[" "]
0["view_definition"] --> 50["select_statement"]
50["select_statement"] --> 51["SELECT"]
50["select_statement"] --> 52["ws"]
52["ws"] --> 53[" "]
50["select_statement"] --> 54["expr"]
54["expr"] --> 55["identifier"]
55["identifier"] --> 56["letter"]
56["letter"] --> 57["e"]
54["expr"] --> 58["."]
54["expr"] --> 59["identifier"]
59["identifier"] --> 60["letter"]
60["letter"] --> 61["f"]
59["identifier"] --> 62["identifier_rest"]
62["identifier_rest"] --> 63["letter"]
63["letter"] --> 64["i"]
62["identifier_rest"] --> 65["identifier_rest"]
65["identifier_rest"] --> 66["letter"]
66["letter"] --> 67["r"]
65["identifier_rest"] --> 68["identifier_rest"]
68["identifier_rest"] --> 69["letter"]
69["letter"] --> 70["s"]
68["identifier_rest"] --> 71["identifier_rest"]
71["identifier_rest"] --> 72["letter"]
72["letter"] --> 73["t"]
71["identifier_rest"] --> 74["identifier_rest"]
74["identifier_rest"] --> 75["_"]
74["identifier_rest"] --> 76["identifier_rest"]
76["identifier_rest"] --> 77["letter"]
77["letter"] --> 78["n"]
76["identifier_rest"] --> 79["identifier_rest"]
79["identifier_rest"] --> 80["letter"]
80["letter"] --> 81["a"]
79["identifier_rest"] --> 82["identifier_rest"]
82["identifier_rest"] --> 83["letter"]
83["letter"] --> 84["m"]
82["identifier_rest"] --> 85["identifier_rest"]
85["identifier_rest"] --> 86["letter"]
86["letter"] --> 87["e"]
50["select_statement"] --> 88["ws"]
88["ws"] --> 89[" "]
50["select_statement"] --> 90["FROM"]
50["select_statement"] --> 91["ws"]
91["ws"] --> 92[" "]
50["select_statement"] --> 93["expr"]
93["expr"] --> 94["identifier"]
94["identifier"] --> 95["letter"]
95["letter"] --> 96["E"]
94["identifier"] --> 97["identifier_rest"]
97["identifier_rest"] --> 98["letter"]
98["letter"] --> 99["m"]
97["identifier_rest"] --> 100["identifier_rest"]
100["identifier_rest"] --> 101["letter"]
101["letter"] --> 102["p"]
100["identifier_rest"] --> 103["identifier_rest"]
103["identifier_rest"] --> 104["letter"]
104["letter"] --> 105["l"]
103["identifier_rest"] --> 106["identifier_rest"]
106["identifier_rest"] --> 107["letter"]
107["letter"] --> 108["o"]
106["identifier_rest"] --> 109["identifier_rest"]
109["identifier_rest"] --> 110["letter"]
110["letter"] --> 111["y"]
109["identifier_rest"] --> 112["identifier_rest"]
112["identifier_rest"] --> 113["letter"]
113["letter"] --> 114["e"]
112["identifier_rest"] --> 115["identifier_rest"]
115["identifier_rest"] --> 116["letter"]
116["letter"] --> 117["e"]
115["identifier_rest"] --> 118["identifier_rest"]
118["identifier_rest"] --> 119["letter"]
119["letter"] --> 120["s"]
93["expr"] --> 121["ws"]
121["ws"] --> 122[" "]
93["expr"] --> 123["AS"]
93["expr"] --> 124["ws"]
124["ws"] --> 125[" "]
93["expr"] --> 126["identifier"]
126["identifier"] --> 127["letter"]
127["letter"] --> 128["e"]

@shnewto and anyone else, let me know what you think! What else should be included in this example? Should examples be linked to in the crate's main README ?

shnewto commented 10 months ago

adding this an example would be awesome, definitely thing a nod to the examples in the readme is a good call, and maybe an examples readme with some more specific details if there are some?