sqlparser-rs / sqlparser-rs

Extensible SQL Lexer and Parser for Rust
Apache License 2.0
2.56k stars 488 forks source link

[DISCUSSION]: move sqlparser to Apache (DataFusion) governance #1294

Open alamb opened 3 weeks ago

alamb commented 3 weeks ago

(disclaimer: I am biased being the one who merges sqlparser prs and also am the Apache DataFusion PMC chair)

Problem Statement

sqlparser seems to have become the defacto sql parsing library in Rust (5.5M downloads at the time of this writing) 🎉

However the sqlparser-rs project doesn't have sufficient maintainer capacity. I (@alamb) do enough to keep it from going entirely dormant, but that is really not sufficient for a healthy project.

Here are the specific problems:

  1. Having contributors wait weeks for review feedback is a bad experience for everyone involved and for that I apologize.
  2. There is not enough capacity to drive large projects (e.g. token locations) forward

Challenges with current governance structure (or lack thereof)

  1. There is no clear way to add additional maintainers
  2. Some employers (for example Apple) only permit contributions to explicitly vetted projects with clear governance (e.g. ASF)

Past discussions:

When DataFusion was part of the Apache Arrow project, we didn't have the correct space to bring SQL parser at that time

Now that DataFusion is its own top level project (with @andygrove and myself on the PMC) there is a natural space to do thos

Specific Proposal:

  1. Move the sqlparser-rs code (and commit history) into the Apache DataFusion project and under its Governance. This would require an IP clearance process to run and would take time.
  2. Move sqlparser-rs repository to apache/datafusion-sqlparser
  3. Archive this repository, and leave links to apache/datafuson-sqlparser
  4. Continue to release sqlparser versions approximately monthly.

Benefits of ASF governance;

  1. More people can approve/merge PRs (committers to DataFusion)
  2. Clear governance structure (rather than sqlparser-rs today which seems to be mostly me)
  3. Clear path to add additional maintenaners (e.g. committers)

Drawbacks

  1. There is a danger that sqlparser becomes "captured" by DataFusion and only accepts features needed for DataFusion
  2. There is additional overhead to the ASF process (releases, in particular, take additional non trivial overhead)

There is plenty of experience with the ASF release process in DataFusion so I don't think that is a major hurdle. I also think DataFusion in general and sqlparser in particular has a long history of accepting features that benefit all users not just maintainers, so I am not worried about this either (but I am of course biased)

cc @Dandandan @tobyhede @andygrove @maxcountryman @nickolay

andygrove commented 3 weeks ago

Thank you for restarting this conversation @alamb. I am also biased as the original author of sqlparser-rs and DataFusion, but I do think the project will have more success under ASF governance, so I am in favor of this proposal.

lovasoa commented 3 weeks ago

Hello! I'm not against it, and would love to be added as a maintainer under the asf governance, if we need some balance to avoid too much bias towards datafusion.

sqlparser-rs is a crucial component of SQLPage, and I'll keep making pull request either way.

vaibhawvipul commented 3 weeks ago

I am in support of this.

alamb commented 3 weeks ago

Hello! I'm not against it, and would love to be added as a maintainer under the asf governance, if we need some balance to avoid too much bias towards datafusion.

Yes, thank you @lovasoa -- I would expect to discuss adding committers to DataFusion who focused on sqlparser-rs (as we have committers focused on another subproject, comet, and we had committers focused on datafusion when it was part of arrow)

jmhain commented 3 weeks ago

I just realized I never actually responded to https://github.com/sqlparser-rs/sqlparser-rs/issues/1243. So: I'm happy to participate as a maintainer of sqlparser-rs regardless of where it ends up. I have a slight preference for keeping it independent but the arguments in favor of moving it under DataFusion seem reasonable.

maxcountryman commented 3 weeks ago

My preference would be to keep the project independent but I'm not able to commit much time to it and it's admittedly a preference mostly based on the idea of how things might appear were it to be moved under the DataFusion project.

nickolay commented 3 weeks ago

I like how @maxcountryman put it, but I believe the decision must be done by those maintaining the project.

I will take the opportunity to express my gratitude to @alamb for keeping the project going for as long as he has. Thanks for doing this, Andrew!

tobyhede commented 3 weeks ago

You have my full support. I am personally so thankful for the endless hours you have put into sqlparser @alamb.

I would love to be involved in any way I can.

cisaacson commented 3 weeks ago

@alamb I think this is a very good idea. One question though: If a fork is created now, it is just of the sqlparser-rs project. If you go this way would it mean creating a fork of the entire DataFusion project? That could be OK but I would like to know what would be involved to make custom changes to sqlparser (or other components for that matter). Since the entire DataFusion project changes fairly frequently it may be some work to keep up with those changes.

In looking into this further I see the workspace is defined in the datafusion/Cargo.toml workspace. So it looks like someone will need a fork of the whole DataFusion project to make something work in any case. This should be manageable. Unless advised otherwise that's what I will do.

Regardless I am fully supportive of the idea of moving sqlparser-rs under the DataFusion umbrella so it is a full-fledged Apache project.

alamb commented 3 weeks ago

If you go this way would it mean creating a fork of the entire DataFusion project?

No I would expect that sqlparser-rs remains in its own repository (just the github organization would be different)