quinn 1.0 release planning

MrPowers commented 7 months ago

quinn is a mature project and is getting ready for a 1.0 release.

We should follow Semantic Versioning 2.0 strictly after making the 1.0 release.

There are a few remaining items for the 1.0 release:

[ ] we should work out a pre-release process that builds a pre-release wheel and runs the tests in a production environment (https://github.com/MrPowers/quinn/issues/252)
[x] drop Python 3.7 that is now EOL (https://github.com/MrPowers/quinn/issues/202, https://github.com/MrPowers/quinn/pull/242)
[x] get rid of functions that should have never been added (e.g. print_athena_create_table and functions that are now built-in to Spark e.g. exists and forall).
[x] Possibly drop Spark 2 support (https://github.com/MrPowers/quinn/issues/202, https://github.com/MrPowers/quinn/pull/242)
[x] Create a code linting workflow that's more seamless
[x] Remove exists forall (https://github.com/MrPowers/quinn/issues/49, https://github.com/MrPowers/quinn/pull/233)
[x] Drop extensions (https://github.com/MrPowers/quinn/issues/237)
[x] Add Spark-Connect tests (https://github.com/MrPowers/quinn/issues/241)
[x] Revisit an ownership and access rights for the repository (https://github.com/MrPowers/quinn/issues/243)
[x] Fix all linter/formatter problems in 1.0 brach (https://github.com/MrPowers/quinn/issues/247)

A planning branch for 1.0: https://github.com/MrPowers/quinn/tree/planning-1.0-release

SemyonSinchenko commented 7 months ago

I think we should create a branch with 1.0 tag to unblock tickets like dropping of 3.7 support and spark 2.x

SemyonSinchenko commented 7 months ago

@MrPowers @jeffbrennan Guys, may you please check my comment above? Currently there is no target branch for 1.0 and all the work is blocked by this

MrPowers commented 7 months ago

Here are the pyspark 2.x vs pyspark 3.x download stats. Looks like most users have transitioned to PySpark 3.

SemyonSinchenko commented 7 months ago

Just one remark from my side. If you use PyPI/MavenCentral to create this plot, it may not be fully correct.

In my experience on 2.4 usually are guys with on-prem architecture, just because, in this case, migration is pain. But at the same moment, on-prem infra usually contains something like on-prem JFrog Artifactory as a mirror of Maven Central and PyPI. And on-prem Artifactory always has local caching mechanisms.

So, I may guess that you just do not see usage stats for 2.4 because guys on prem are downloading it from local mirrors, not centralized.

nijanthanvijayakumar commented 2 months ago

Hi @MrPowers and @SemyonSinchenko are you able to elaborate more on the below?

Create a code linting workflow that's more seamless

I am keen to work on that, but would need more context on the expected workflow. Thanks in advance.

SemyonSinchenko commented 2 months ago

Hi @MrPowers and @SemyonSinchenko are you able to elaborate more on the below?

Create a code linting workflow that's more seamless

I am keen to work on that, but would need more context on the expected workflow. Thanks in advance.

It seems to me that it was about Ruff, and you already did it.

fpgmaas commented 2 months ago

@SemyonSinchenko @MrPowers Do we really need a separate 1.0-planning-release branch? Wouldn't it be better to use Github milestones, and simply push to main? That way:

We can release changes incrementally in the 0.x.y version by bumping x.
Contributors do not get confused when they make PR to main and then hear they should make the chnage on top of the 1.0-planning-release branch.

Then when all issues corresponding to the 1.0 milestone have been implemented through PR's, we simply release 1.0 from main. That way we also have more confidence that the proposed 1.0 release is actually stable.

What do you think?

SemyonSinchenko commented 2 months ago

@SemyonSinchenko @MrPowers Do we really need a separate 1.0-planning-release branch? Wouldn't it be better to use Github milestones, and simply push to main? That way:

We can release changes incrementally in the 0.x.y version by bumping x.

Contributors do not get confused when they make PR to main and then hear they should make the chnage on top of the 1.0-planning-release branch.

Then when all issues corresponding to the 1.0 milestone have been implemented through PR's, we simply release 1.0 from main. That way we also have more confidence that the proposed 1.0 release is actually stable.

What do you think?

Hey! I'm not very familiar with milestones. Let me explain the idea of the separate branch:

We need to avoid making releases of quinn with breaking changes without bumping of the major version (semantic versioning) just because a lot of pipilines around the world relying on quinn;
We need to have an ability to release a hot-fix / security-fix for the current latest quinn with bumping a minor version;
In the future, when 1.0 branch is done, it becomes a new "main," but I want to have an ability to release security fixes for the old quinn (that is running on old pyspark legacy pipelines). So, the current main becomes a "quinn-legacy" branch after the release 1.0. We will be able to do security-fixes and bug-fixes at least.

In my career, I have experience working on-premise, and I know that updating critical pipelines and heavy loaded clusters is not an easy task. I know a lot of companies that are still sitting on spark-2 with YARN and on-premise data centers.

fpgmaas commented 2 months ago

I understand. One small note though: Since the project is still on major version zero, breaking changes can still be introduced without bumping the major version:

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

By stacking many changes in the 1.0.0 release that were not tested through public releases, there is an increased risk of detecting bugs after the 1.0.0 release that potentially require a small breaking change to the public API. These would would then immediately require a 2.0.0 release to fix, rather than a 0.<y+1>.z. Of course that is not necessarily a problem in itself.

Anyway, I think the approach you have decided upon also makes sense. Just wanted to challenge it a bit :)

mrpowers-io / quinn

quinn 1.0 release planning #199