spaceandtimelabs / sxt-proof-of-sql

Space and Time | Proof of SQL
Other
1.58k stars 70 forks source link

Replace `proof-of-sql-parser` with `sqlparser`. #235

Open JayWhite2357 opened 1 week ago

JayWhite2357 commented 1 week ago

Background and Motivation

Currently, we have an in-house parser that is built on the lalrpop parser-generator. This has been good while the supported syntax has been simple. However, as the supported syntax has grown, we need a more comprehensive parser.

The sqlparser crate is the parser used by DataFusion, which is part of the Arrow ecosystem. It is a feature-rich parser that ultimately will require less code maintenance. It is no_std compatible, so there should be no issues integrating it.

Changes Required

JayWhite2357 commented 1 week ago

@iajoiner may be able to provide a bit of guidance on this issue, however, there is no concrete plan for this. It would be wise to discuss this plan with @iajoiner and myself to ensure that things are heading in the right direction before building in earnest.

JayWhite2357 commented 1 week ago

/bounty $10000

algora-pbc[bot] commented 1 week ago

💎 $10,000 bounty • Space and Time

Steps to solve:

  1. Start working: (Optional) Comment /attempt #235 with your implementation plan. Note: we will only assign an issue if you include an implementation plan with a time estimate. Additionally, to be assigned an issue, you must have previously contributed to the project. You can still work on an issue and submit a PR without being assigned.
  2. Submit work: Create a pull request including /claim #235 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to spaceandtimelabs/sxt-proof-of-sql!

Add a bounty • Share on socials

Attempt Started (GMT+0) Solution
🔴 @varshith257 Oct 9, 2024, 1:50:03 PM WIP
🟢 @animeshd9 Oct 9, 2024, 8:21:53 PM WIP
🟢 @TomBebb Oct 15, 2024, 10:39:12 PM WIP
🟢 @deependujha Oct 16, 2024, 12:17:06 PM WIP
varshith257 commented 1 week ago

/attempt #235

Algora profile Completed bounties Tech Active attempts Options
@varshith257 15 bounties from 7 projects
Go, Scala,
TypeScript & more
﹟239
Cancel attempt
varshith257 commented 1 week ago

@JayWhite2357 I will connect on discord for the same

animeshd9 commented 1 week ago

/attempt #235

varshith257 commented 1 week ago

Hi @JayWhite2357 and @iajoiner,I'd like to proceed with the transition from the LALRPOP-based parser to sqlparser for the proof-of-sql-parser crate. I have gone through current implementation and working of proof-of-sql parser. Here's the approach I have in mind:

  • [ ] I will add the sqlparser crate to handle the parsing of SQL queries.

  • [ ] Then I move ahead to the replace the LALRPOP Parser

    • [ ] The LALRPOP-based parser defined in sql.lalrpop will be replaced by a new parsing function that utilizes sqlparser to parse
    • [ ] This will specifically focus on the SELECT statements following PostgreSQL syntax as it currently does(Select Statement Parsing, Expressions and Operators, Lexer Rules, ORDER/GROUP BY, Literals and Identifiers etc...)
  • [ ] Next I step to map the sqlparser AST to existing intermediate AST (intermediate_ast::SelectStatement and other structures) to minimize changes to the rest of the crate.This will ensure the current logic built on top of the AST stays intact

  • [ ] I will thoroughly test the changes incorporate at every stage to ensure all current functionality is maintained.

  • [ ] I think we can also adapt the existing ParseError and ParseResult types to handle errors from sqlparser

Before moving forward with this plan in my mind, I wanted to check in and ensure this approach aligns with the migration goals. If there are any specific concerns or alternative suggestions, I’d be happy to adjust the plan.

I am looking forward to your feedback!

JayWhite2357 commented 1 week ago

I'm looking into it a bit, and I feel like sqlparser::ast::Query is thing that would be analogous to intermediate_ast::SelectStatement.

My initial intent was for the sqlparser AST to replace the intermediate AST. This would mean that QueryExpr::try_new would accept a sqlparser::ast::Query (or similar) instead of a intermediate_ast::SelectStatement. In this situation, most (if not all) of the code changes would be in the proof_of_sql::sql::parse module.

Only replacing lalrpop with sqlparser, but not replacing intermediate AST is an interesting idea that I hadn't thought of. Perhaps it makes sense as a stepping stone, but I feel like it can't be the end goal here.

@iajoiner might have some feedback on this.

varshith257 commented 1 week ago

Thanks @JayWhite2357 for your view on this. @iajoiner Any insights on this

JayWhite2357 commented 1 week ago

Thanks @JayWhite2357 for your view on this. @iajoiner Any insights on this

I chatted with him. He's on the same page. The goal here should be to remove the intermediate AST altogether.

varshith257 commented 1 week ago

Got it! I just started with basic SELECT parsing logic to see how sqlparser works with an example.

@JayWhite2357 I have joined Discord. If we have a thread in a related channel on discord, we can easily track all progress and discuss more about it in moving forward.

iajoiner commented 5 days ago

@varshith257 I can chat with you on Discord. What's your handle there?

varshith257 commented 4 days ago

@varshith257 I can chat with you on Discord. What's your handle there?

@iajoiner I'd: vamshi_257

iajoiner commented 4 days ago

Cool! Just sent you a message there.

TomBebb commented 3 days ago

/attempt #235

varshith257 commented 3 days ago

@iajoiner /@JayWhite2357 I am now ready to tackle the issue(PS: Done with my exams). We have now solid plan to tackle this. I have connected @iajoiner in the discord. I am willing to get this issue assigned. So that there is no duplicated efforts from other contributors.

The time estimation to complete this is TBD with starting of base work and digging more we can know the time estimate IMO😅

deependujha commented 3 days ago

/attempt #235

varshith257 commented 12 hours ago

@JayWhite2357 Is @iajoiner is on holiday? I am also thinking of connecting with you on discord :

Here's my ID: : vamshi_257

JayWhite2357 commented 11 hours ago

@varshith257 we're all a bit swamped at the moment. I connected with you on discord as well.