lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.86k stars 413 forks source link

How can I make an SQL parser with lark? #683

Closed btseytlin closed 1 year ago

btseytlin commented 4 years ago

What is your question?

I would like to create an SQL parser with lark. I understand that I need an SQL grammar for this. I would rather not make one myself. The readme states that the expected grammar format is EBNF.

How can I use, for example, one of these grammars: https://ronsavage.github.io/SQL/? I tried using them, but they clearly have a wrong format.

If there is no way to use one of these, where can I find a grammar in suitable format?

erezsh commented 4 years ago

I read one of their sql grammars (sql-2003-2.bnf). It's indeed not in the format suitable for Lark. You'll have to:

  1. Extract the actual BNF from the html file (i.e. remove or comment markup)

  2. Convert the syntax to Lark's syntax, for example ::= should be :, and rules aren't surrounded by <>, etc.

And even then, you might have to make some adjustments to make it parse correctly.

I do offer freelance services for writing grammars, so if that's something you're interested in, we can discuss it.

But if you want to avoid writing the grammar, you should consider existing SQL parsers for Python, that unfortunately don't use Lark.

btseytlin commented 4 years ago

@erezsh thank you!

erezsh commented 4 years ago

@glebmezh Sent you an email.

zbrookle commented 4 years ago

@erezsh @btseytlin I’ve actually already written a SQL parser using lark, you can find it here. @erezsh I’ve been meaning to reach out to add it to the list in your README

erezsh commented 4 years ago

@zbrookle Thanks, looks like a decent start.

Btw I noticed you're not allowing "FULL\nOUTER JOIN" and so on, but I think SQL does allow it.

zbrookle commented 4 years ago

@erezsh Yeah that's a good point, it doesn't really matter how much white space there is between the tokens

zbrookle commented 4 years ago

@erezsh I actually just tested this and it turns out that lark is including \n as part of \s. Not sure if that's a bug or not since I'm ignoring WS. If it isn't expected I can open an issue

MegaIng commented 4 years ago

Well, yes, \n is part of the regex group \s. This is not a 'problem' of lark, but of the re library.

zbrookle commented 4 years ago

@MegaIng This was in reference to somewhere that I was using \s and @erezsh pointed out that it should also accept \n, which after testing it, it does. So either the package doesn't work as expected, or the comment was perhaps incorrect

MegaIng commented 4 years ago

I think @erezsh made a mistake, but I am not sure.

erezsh commented 4 years ago

@zbrookle Not sure which comment that was, but legend has it that I can sometimes make mistakes.

Yes, \s includes \n.