ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
4.36k stars 540 forks source link

refactor: use nix to manage development dependencies #2924

Closed cpcloud closed 2 years ago

cpcloud commented 2 years ago

This is a PR to change the way local development is done in Ibis (while preserving as much backwards compatibility with conda and non-conda/non-nix workflows as possible!), accomplishing the following things:

The main change here is the addition of a development environment that is self-contained and derived from a single source of truth (pyproject.toml), using nix.

See docs/web/contribute.md for more details.

Note that both conda and setuptools-based development workflows are still supported as first class citizens.

DONE:

  1. ~A move to poetry for dependency management.~ done in #2937 It is still possible and supported to use setup.py and setuptools if you like, this functionality is checked in CI.
  2. ~More extensive CI testing, with minimal additional CI time used. It looks like roughly on average 3 minutes of additional CI time occur, with more of the library being tested on Windows in particular~ done in #2937
  3. ~Automatic generation of a conda recipe for every release, uploaded to GitHub releases (this is the remaining 1% that isn't automated but this can be automated by pushing a feedstock pull request directly from CI)~ I plan to do this in conda-forge directly, using their integrated bot automation
  4. ~Automated Dependabot updates for github-actions and poetry dependencies~ this is done using Renovate in #3053
  5. ~Generating a setup.py file from the project's pyproject.toml file~ ~This is now just checked that it doesn't need to be regenerated, since we want to ensure this is tested in CI where relevant.~ done in #3054, necessary for Renovate dependency update PRs
  6. ~Automatic generation of a conda environment file, uploaded to GitHub releases, useful for developers using conda~ We have automatically generated lockfiles for linux, macos, and windows, for all three versions of Python currently supported (3.7, 3.8, 3.9). These live under conda-lock and can be used to set up a development environment very quickly since there's no solve step required. Done #3077, #3080, #3083, #3087

Follow ups:

  1. workflow_dispatch (click-to-run) GitHub Actions workflow that cuts a release to PyPI and GitHub Releases.
  2. Automatic license updates once a year or on-demand. This is optional, but it's another thing we can automate at some point.
  3. ~Automatic nix dependency updates via PR submission every six hours or on-demand. This has to be done in a follow up.~ Done in https://github.com/ibis-project/ibis/pull/3182

Happy to address any concerns here, let's automate everything!

pep8speaks commented 2 years ago

Hello @cpcloud! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2021-12-22 11:33:58 UTC
jreback commented 2 years ago

cc @datapythonista

cpcloud commented 2 years ago

@jreback

can you add a README.md explaining how the poetry2conda is used?

Yup. In short it's used to generate an environment file in CI.

how are the min version set now for packages (e.g. where).

The minimum versions for dependencies are set in pyproject.toml

also is the release process documented somewhere? (even if fully automated)

Light documentation of it exists in docs/web/contributing..md, but I'll spell it out in more detail there.

icexelloss commented 2 years ago

@cpcloud left some comments, in general +1 to standardize/automatic things so thanks for this!

My main concern is that I haven't used many of the tools here (I think some other folks as well), so would be really helpful to write dev docs describing how the new CI/dev/release works.

cpcloud commented 2 years ago

@cpcloud left some comments, in general +1 to standardize/automatic things so thanks for this!

My main concern is that I haven't used many of the tools here (I think some other folks as well), so would be really helpful to write dev docs describing how the new CI/dev/release works.

Heck yeah! @jreback has requested more docs as well. I think I'll also add a screencast to really help clarify how these things are used.

cpcloud commented 2 years ago

@jreback @icexelloss I've added a bunch of docs on the dev and release process. Please review at your earliest convenience!

gerrymanoim commented 2 years ago

A nice addition here might be to also add a devcontainer.json file that bootstraps a dev env, both in vscode and github codespaces.

https://docs.github.com/en/codespaces/customizing-your-codespace/configuring-codespaces-for-your-project

cpcloud commented 2 years ago

A nice addition here might be to also add a devcontainer.json file that bootstraps a dev env, both in vscode and github codespaces.

https://docs.github.com/en/codespaces/customizing-your-codespace/configuring-codespaces-for-your-project

Interesting, let me look into that.

cpcloud commented 2 years ago

I was able to get an environment working in codespaces.

@gerrymanoim would you be ok with that in a follow up? The branch is up at devcontainer on my fork if you want to try it.

datapythonista commented 2 years ago

Do you think you can split this PR into multiple parts, and get this merged incrementally? Changes here are huge, and will be very inefficient to have to review everything at each iteration. And there are many things that are somehow independent to the final stage, like adding a pyproject.toml, formatting changes to CI files...

cpcloud commented 2 years ago

Do you think you can split this PR into multiple parts, and get this merged incrementally? Changes here are huge, and will be very inefficient to have to review everything at each iteration. And there are many things that are somehow independent to the final stage, like adding a pyproject.toml, formatting changes to CI files...

Some of it, yeah. I can revert the formatting changes and add those to a follow up. The refactoring to use nix and the refactoring of main.yml, and all the file deletions need to be done together though, to prevent breakage. Many of the files "changed" are files that are being removed because they are no longer necessary.

cpcloud commented 2 years ago

@datapythonista I've pared down the PR quite a bit, and have removed all of the formatting changes and a few workflows that I wlll add in separate PRs once this lands.

cpcloud commented 2 years ago

@datapythonista

For example, can commitlint be implemented earlier? Feels like we can add a job for it in the current CI, without the rest of stuff.

Yes, but I'm not how we'd bring the dependency in without nix. We probably can, but it doesn't seem worth the trouble just to delete the code later.

I guess you have it under control, but I'm wondering about how the commit linting works with GitHub commit squash. If it doesn't work as expected, having it merged in a HUGE pull request like this, with lots of other things possibly failing, can be very disruptive.

The Python implementation of semantic release does not support squash merging.

Honestly, squash versus rebase merging seems like bikeshedding if you have a commit structure being enforced by a tool. IMO the automation is totally worth not having a single commit for every PR (I do understand the value of this, but it's overly restrictive IMO if you lose the ability to remove toil from maintainers' lives). None of us really have to think too hard about releases anymore, and versioning becomes unsentimental and meaningful.

And I think there are several other cases where we can merge things early. Like, can we move the isort config from setup.cfg to pyproject.toml before moving to nix?

Sure, that's doable. I will see if there's anything else like this and open one or more PRs.

Can we start using poetry before nix too?

We can, but it's not clear to that that will reduce the number of changes by a significant amount. I will look into whether that can be split.

If I'm wrong, and we need to merge all this together, I'd prefer to merge this after the release of Ibis 2.0. I'm worried this can significantly delay the release, which would be good to have out asap.

That's fair. FWIW, after these changes are landed (in a sequence of PRs or otherwise), releases will happen automatically, so modulo the rare CI flakiness, we won't have to worry about release delays ever again.

kszucs commented 2 years ago

Splitting it into two PRs would make it easier for us to review. I definitely like the switch to poetry, especially with the conversion scripts to setup.py and conda recipe, though I'm not so certain about nix (but can more easily evaluate in a separate PR).

gerrymanoim commented 2 years ago

@gerrymanoim would you be ok with that in a follow up? The branch is up at devcontainer on my fork if you want to try it.

@cpcloud Definitely fine as a follow up, mostly a nice to have that I wanted to get in at some point.

github-actions[bot] commented 2 years ago

Unit Test Results

       38 files         38 suites   1h 17m 59s :stopwatch:   8 022 tests   6 009 :heavy_check_mark: 2 013 :zzz: 0 :x: 29 568 runs  22 469 :heavy_check_mark: 7 099 :zzz: 0 :x:

Results for commit aa311386.

:recycle: This comment has been updated with latest results.

icexelloss commented 2 years ago

@cpcloud Sorry for the delay. I am being terrible at keeping track these review requests. Will take a look now.

icexelloss commented 2 years ago

@cpcloud I am not knowledge enough to review all the config changes etc (5000 lines of code! lol). But I reviewed the doc and it looked OK and in general this seems to make things better even this will change everyone's workflow and CI, so don't block this because of me.

Looks like now the new CI is using nix now and perhaps everyone should just switch to that in dev?

cpcloud commented 2 years ago

We should merge the poetry PR first, which will greatly reduce the size of this PR.

icexelloss commented 2 years ago

@cpcloud Also if we merge this and someone has issue with the new dev workflow, what's the best to get help? (I think most contributors at the moment are not familiar with tools like nix poetry etc unfortunately)

cpcloud commented 2 years ago

/condalock

cpcloud commented 2 years ago

/condalock

cpcloud commented 2 years ago

/condalock

cpcloud commented 2 years ago

Another thing to remember here: developers are not required to use cz or any of the tooling here, conda, setuptools, and poetry dev workflows are all supported.

The new tooling will run in CI and validate that commit messages have the right structure for semantic release.

cpcloud commented 2 years ago

The pre-commit check will continue to fail until this is merged, after which I can turn it off since CI is checking pre-commit after this PR.

cpcloud commented 2 years ago

@jreback

umm why all the references to 'breaking changes' here?

It seems like you might not have had a chance to take a look at semantic-release? In particular this section answers this question.

why is this in any way controlled by the commits that someone pushes?

The answer to this question is also in the above links, but I'll give a brief summary.

The point of semantic release is to entirely automate the drudgery of release. This is done by giving semantics to commit messages, e.g., feat is a feature (causing a minor version bump), bug is a bug causing a patch version bump. The structure is enforced by cz (optional to use) and enforced by the commitlint check in CI (not optional).

I think you may be missing the fact that these version determinations are batched at release time: there won't be a new release per commit, that would be excessive.

Instead, the commits between the last release and the current commit are examined for feat, bug etc. If feat is found, it's a minor bump (from say 2.0.0 to 2.1.0), if bug is found it's a patch bump (from say 2.0.0 to 2.0.1). There are a few other rules (notably any message with BREAKING CHANGE: blah blah blah triggering a major version bump) but what I just wrote captures most of the important information.

All of this is covered in the semantic-release repo, please take a look at that.

Also note that this PR doesn't implement the release automation. I explicitly state that it'll be done in a follow up in the PR description. Did you read that? Please do so if you haven't.

how could they possibly know if something is a breaking change or not?

Not sure who they is here, but if they means "a developer", then those are the only people who can know what breaks and what doesn't before a release.

If by they you mean "commit messages", then again, that's covered in the semantic-release repo link above.

jreback commented 2 years ago

by they i mean - a person pushing commits

they cannot possibly know what is. breaking change or not

the maintainers could know but o am not sure how this helps

cpcloud commented 2 years ago

by they i mean - a person pushing commits

they cannot possibly know what is. breaking change or not

the maintainers could know but o am not sure how this helps

Not sure what the objection is here. We don't have random people making commits. Every change goes through review. If a change needs a commit to indicate a breaking change, then it's up to maintainer to make sure that the commit has it. Typically this would be done by referring someone to the docs.

jreback commented 2 years ago

If a change needs a commit to indicate a breaking change, then it's up to maintainer to make sure that the commit has it.

this seems like quite a lot of work - meaning the commits need to be exactly right

having release notes as part of the change is way easier

cpcloud commented 2 years ago

If a change needs a commit to indicate a breaking change, then it's up to maintainer to make sure that the commit has it.

this seems like quite a lot of work - meaning the commits need to be exactly right

having release notes as part of the change is way easier

It's not a lot of work, and cz handles it. I've used this procedure at both past companies and in my own projects and whatever small overhead there is in making your commits more structured is easily made up by the ability to release with confidence, whenever we want. cz makes this process even easier because it'll prompt you, and construct the commit message correctly. Check out the prompts I show above, which you explicitly requested.

It might be useful to try it out, so that if you have specific objections I can address those.

We don't have that many breaking changes.

We do however need to release faster, and it needs to be a super boring and routine procedure that anyone can do with a single click.

I would like to move forward with this unless there specific actionable objections, in which case I will address them.

jreback commented 2 years ago

ok will look again tomorrow

in any event would like to make the docs super clear about this

jreback commented 2 years ago

btw likely we are plaguaged by https://github.com/pypa/setuptools/issues/2941 (to mitigate can pin < 60)

cpcloud commented 2 years ago

btw likely we are plaguaged by pypa/setuptools#2941 (to mitigate can pin < 60)

Thanks. For now it looks like this PR isn't affected.

@jreback On a different note, I've reworked the docs and made it even easier to use cz (it's now a dev-dependency). Let me know if there's anything else you'd like to me to add to the contributing.md docs.

jreback commented 2 years ago

kk will look shortly

cpcloud commented 2 years ago

/condalock

cpcloud commented 2 years ago

/condalock

cpcloud commented 2 years ago

Thanks.

I'm going to turn off the pre-commit.ci integration and move us to rebase and merge for merging. commitlint will take care of notification of non-conforming commit messages

jreback commented 2 years ago

thanks @cpcloud great

ibis-project-bot[bot] commented 2 years ago

:tada: This PR is included in version 2.1.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: