jorisroovers commented 1 year ago

I’d like to get some input on best-practices around python build/test/project management/CI tooling. I want to streamline the gitlint developer and release experience to:

Reduce release overhead (eventually: more frequent and smaller releases)
Clean up some unused and outdated code (most notably in ./run_tests.sh)
Get gitlint up-to-date with recent python test, build and deploy best-practices (and personally learn a few things along the way)

Background

Test running

Today gitlint uses a custom shell script run_tests.sh which wraps most other test tools like pytest, pylint, black, coverage, and radon, as well as a build (python setup.py sdist bdist_wheel) test.
run_tests.sh also has some other utility functions like clean and some outdated python and docker environment management related code that is not used anymore.
I’d like to get rid of run_tests.sh entirely and move to a python tool or script.
I’d like this tool to do python environment management (like tox), so it becomes easier to run these tools against multiple python environments locally. Currently we use a test matrix in CI, perhaps this can be (partially) handled by the testing tool instead (so it’s independent of GHA).

Builds

gitlint consists of 2 python packages gitlint (root dir) and gitlint-core (gitlint-core dir), whatever tool/script we use should be able to support multiple python packages in a single repo / root dir.
I’d like to move away from setup.py towards declarative config and pyproject.toml specifically. The build system should be PEP517 compliant.
I’d like to remove the various requirements*.txt files in favor of using extra_requires for test, docs, etc.
A publish command for publishing to PyPI is a nice to have.
An immediate goal is to have a fully automated CI pipeline that publishes to Test PyPi on every commit. Other than writing the required pipeline that does this, this also requires generating a unique version for every commit - I assume that would probably be based on the git hash of main (seems to be the norm in my experience). Sample version string: 0.19.dev-abc123. I know some build systems or plugins help with this type of version management - that would be a nice to have. Currently we maintain gitlints version manually in 2 files: setup.py and gitlint-core/gitlint/__init__.py (also used to display the version in the UI).

Candidates

I really like the idea of having a single project management tool that handles/wraps both the build and script running parts. This keeps the learning curve low and the developer experience good.
I’ve looked at pyinvoke before, but that was 5y ago. I don’t think pyinvoke is still a candidate, too much manual work and too customized - but it might be.
I have experience using poetry and like it, but they’ve also done some weird things - I know opinions are divided on it. I think @asottile put it very eloquently in his video. So I think poetry is out as well.
I know there’s a ton of other tools, I’ve been looking at a few of them: https://packaging.python.org/en/latest/key_projects/
Specific call-outs to pip-tools, pre-commit and tox which I think are all in the running but none of them solve the problem entirely (so some “glue” would be needed).
I noticed that mkdocs recently adopted hatch and that got me very intrigued. Right now that’s what I’m mostly leaning towards (at least to give it a try).

I’m planning to work on this in the next few weeks and would appreciate input on it. 🙏

Final Note: There’s many roads that lead to Rome, I think it’s likely there won’t be any consensus on this, in which case I’ll make a judgement call and just try something :-)

CC: @sigmavirus24, @asottile, @webknjaz, @ssbarnea, @andersk

webknjaz commented 1 year ago

I’d like to get some input on best-practices around python build/test/project management/CI tooling. I want to streamline the gitlint developer and release experience to:
1. Reduce release overhead (eventually: more frequent and smaller releases)
...
3. Get gitlint up-to-date with recent python test, build and deploy best-practices (and personally learn a few things along the way)

I like having a separate job for building the artifacts (sdist and wheel) but I haven't gotten reusable bits for this :( Still managing such setups manually. The tests would depend on that job and test the actual artifacts (saved as GHA artifacts by the first job) instead of a Git checkout, and then, the publishing stage could consist of two jobs — one for PyPI and the other for TestPyPI. TestPyPI could conditionally auto-publish (the condition would be pushes to the default branch) and PyPI would be conditioned with the release trigger and an environment. The environments feature allows you to have a protection for the jobs with an environment set — you can set it up to require manual approvals + a cooldown timer before it can begin. You'll also be able to store secrets under environments, as opposed to having them repo-global. I also use environments for the TestPyPI job but w/o an approval required. I've recently started extending the simple “publish to PyPI” action with creating GitHub Releases coupled with GitHub Discussions after such publishing.

Note that publishing from the default branch will require version numbers that are unique for each commit push. This means that instead of hardcoding the version, you'd need a mechanism for generating them from Git. I normally use the setuptools-scm plugin for this but there's others.

Background

Test running

* Today gitlint uses a custom shell script [run_tests.sh](https://github.com/jorisroovers/gitlint/blob/5ffc227a7169602e8fd916097bd1572a29b54cc5/run_tests.sh) which wraps most other test tools like `pytest`, `pylint`, `black`, `coverage`, and `radon`, as well as a build (`python setup.py sdist bdist_wheel`) test.

* `run_tests.sh` also has some other utility functions like `clean` and some outdated python and docker environment management related code that is not used anymore.

* I’d like to get rid of `run_tests.sh` entirely and move to a python tool or script.

* I’d like this tool to do python environment management (like `tox`), so it becomes easier to run these tools against multiple python environments locally. Currently [we use a test matrix in CI](https://github.com/jorisroovers/gitlint/blob/5ffc227a7169602e8fd916097bd1572a29b54cc5/.github/workflows/checks.yml#L16-L19), perhaps this can be (partially) handled by the testing tool instead (so it’s independent of GHA).

It is common to use tox or nox for this. You can use them as regular command runners, not just for testing. In tox, you can list the configured environments via tox -av. People run pytest, python -m build and Sphinx docs builds like that. You can also configure some matrixes inside. When having a test matrix in GHA, you can invoke tox -e py which will run your default testenv with the Python that's currently in use, this is how you map tox to GHA, typically. But locally, you can have tox set up to run all of the matrix factors by just invoking tox. Some people do this, I often just make it run testing under the current Python interpreter and let people request running more via explicit args.

Builds

* gitlint consists of 2 python packages `gitlint` (root dir) and `gitlint-core` (gitlint-core dir), whatever tool/script we use should be able to support multiple python packages in a single repo / root dir.

You'll need to keep two package definitions and just invoke python -m build for both dirs separately.

* I’d like to move away from `setup.py` towards declarative config and `pyproject.toml` specifically. The build system should be PEP517 compliant.

Today, probably all the available tools are compatible with this, including setuptools.

* I’d like to remove the various `requirements*.txt` files in favor of using `extra_requires` for `test`, `docs`, etc.

-1 on this — such requirements don't belong in the distribution package metadata that is consumable by the end-users. Can you imagine your typical non-contributor user running pip install gitlint[docs]? Why? Go for specifics deps subsets for each of the envs, paired with corresponding constraints files, generated by pip-tools. I'm exploring even more granular techniques with per-tested-env constraint files but I won't be suggesting that at this time.

* A `publish` command for publishing to `PyPI` is a nice to have.

If you'll go GHA-way, you'll probably end up using my action https://github.com/marketplace/actions/pypi-publish, not really needing this. Otherwise, it's just a call of twine upload. But yeah, it's possible to wrap that with tox.

* An immediate goal is to have a fully automated CI pipeline that publishes to Test PyPi on every commit. Other than writing the required pipeline that does this, this also requires generating a unique version for every commit - I assume that would probably be based on the git hash of `main` (seems to be the norm in my experience). Sample version string: `0.19.dev-abc123`. I know some build systems or plugins help with this type of version management - that would be a nice to have. Currently we maintain gitlints version manually in 2 files: [setup.py](https://github.com/jorisroovers/gitlint/blob/5ffc227a7169602e8fd916097bd1572a29b54cc5/setup.py#L26) and [gitlint-core/gitlint/__init__.py](https://github.com/jorisroovers/gitlint/blob/5ffc227a7169602e8fd916097bd1572a29b54cc5/gitlint-core/gitlint/__init__.py#L1) (also used to display the version in the UI).

As I mentioned above, you can use setuptools-scm or dunamai (probably). Not sure how non-setuptools ecosytems do this, but there's probably plugins for this too. One thing to mention is that your suggested version would be invalid. Having it compatible with PEP 440 is a must, otherwise the uploads will be rejected. setuptools-scm does version guessing based on the previous tag version (so make sure to checkout the whole Git repo in CI for such runs), incrementing the last (usually PATCH) version segment and adding a .dev999 with the distance to that last tag. It also adds a git hash in the form of +gabd83e. It is called a local version part. From the perspective of PEP 440 it's fine but you cannot upload such dists to PyPI — the uploads will be rejected. So for whenever you're preparing a dist for releasing to the package indexes, you'll have to select a version guessing mode that doesn't append the local version part.

* I have experience using `poetry` and like it, but they’ve also done some weird things - I know opinions are divided on it. I think @asottile [put it very eloquently in his video](https://www.youtube.com/watch?v=Gr9o8MW_pb0). So I think poetry is out as well.

I haven't seen Anthony's video, but I also have bad experience with Poetry, including the contribution attempts. I feel like the rest of the PyPA has a similar opinion.

* Specific call-outs to pip-tools, pre-commit and tox which I think are all in the running but none of them solve the problem entirely (so some “glue” would be needed).

Yes, that's correct.

* **I noticed that mkdocs recently adopted [hatch](https://hatch.pypa.io/latest/) and that got me very intrigued. Right now that’s what I’m mostly leaning towards (at least to give it a try).**

Yes, I'm seeing a lot of projects migrating to Hatch recently. And it's now hosted under the PyPA umbrella. I haven't looked too deep into it, but since many people I respect like it, I'm inclined to blindly trust their judgement. I don't know the state of their SCM integration (which is important for the points above) but I saw the maintainer of setuptools-scm interacting with this project, so I assume that it's probably supported already. (I tried to contribute a similar feature to Poetry like half a decade ago, the PR never got accepted and the plugin system capable of facilitating this only appeared recently, for comparison).

setuptools is also much better nowadays, then it used to be. I still use it a lot, one benefit is huge adoption, familiarity and that the downstream packagers know how to work with it. Not that it matters much with PEP 517 being a common interface for calling any build backends.

ssbarnea commented 1 year ago

Nice to ask, but keep in mind that "single" is more of an ideal and that the more people you ask the more solutions you will get suggested. Still, much better than not asking.

After a decade of building python packages, my personal take is:

avoid fancy new tools, or you will pay the cost of being an early adopter
most popular are likely the most stable
setuptools-scm for versioning
pip (not poetry, hatch), optional pip-tools for more advanced stuff
PEP-517 is a must
tox
pre-commit for linting
GHA for CI, pick one or two projects that use it and that have very active maintenance, steal from them.
For docs I used to recommend sphinx but I have to confess that I am actively looking for a way to avoid in the future and go for something pure-md. On my to-review list I have mkdocs and docusaurus.

And last bit: avoid trying to change/fix too many things at one, progress here is done with small increments, or you endup moving too much at once and likely fail under the burden.

ofek commented 1 year ago

Hatch maintainer here! Let me know if I can help

ofek commented 1 year ago

Regarding setuptools, a few benefits of switching to Hatchling:

Supports static analysis tools for editable installations by default whereas the new setuptools does not i.e. it requires enabling an option for IDEs to work.
Uses Git-style glob patterns rather than stdlib globs to better match user expectations.
Allows easy and explicit control of what gets shipped in the sdist target whereas with setuptools it's an objectively confusing mix of conditional wheel options and a MANIFEST.in file.

ssbarnea commented 1 year ago

Yesterday I managed to migrate the first (minor) project to pure pyproject.toml via setuptools-scm. I will soon do it for the rest but I will need to allow few downstream packagers (rpm,deb,arch) time to catch-up, as I am almost sure that some of them will struggle to read the dependencies, this being the reason why my first attempt to do it 6mo ago with another project went on hold.

ofek commented 1 year ago

Why pip in build-system.requires?

ssbarnea commented 1 year ago

I think that that this sections needs cleaning, especially as newer versions of setuptools-scm no longer need archive. I just did not had time to clean it.

ofek commented 1 year ago

Have you tried hatch-vcs? Example: https://github.com/psf/black/blob/main/pyproject.toml

webknjaz commented 1 year ago

Why pip in build-system.requires?

The same question about wheel too.

ofek commented 1 year ago

The same question about wheel too.

Yeah it's automatic. That wheel is required for setuptools is another big issue:

jorisroovers commented 1 year ago

Thanks for all your inputs - definitely insightful. Really appreciated!

I’m going to give hatch a shot, I’ve implemented the basic setup in #384. I really like hatch 🫶, the end result is very elegant IMO. I’ll leave the PR open for a few days to gather input before merging it.

I haven’t implemented the CI deployment too Test PyPi on every commit yet, but I can see a few ways to do that (potentially using https://github.com/ofek/hatch-vcs) - I’ll do that next.

jorisroovers commented 1 year ago

Ok, I’ve finally come around to implementing hatch-vcs in #410, works well! I’ll merge that PR soon.

One thing I ran into is that when trying to publish is that PyPI does not accept PEP440 local versions

# This is not accepted, since it contains a local version (the + and everthing after)
0.19.0.dev52+ga8eb527

# This does work
0.19.0.dev52

In practice, I believe this means I’ll have to explicitly tag every commit I wish to publish to (test)pypi. This makes me reconsider the idea of publishing on every commit, since it would basically mean having to (auto)tag every commit on main. While possible, polluting our tag list this way doesn’t feel right.

Why I wanted to do this in the first place:

To lower the friction and effort of releases as much as possible because they’d be happening all the time. This includes automated post-release integration testing. The eventual goal would be to increase the user-facing release cadence as well (smaller, more frequent releases).
To allow users to more easily install pre-release versions of gitlint, particularly interesting for when a bug is fixed (or PR merged) for which they normally would need to wait until the next release. While technically already possible by adding a dependency pointing to git, it’s less user friction to be able to just specify gitlint==0.19.0.dev52.

Based on this, I’m now thinking to not auto-publish on every commit but to just create a pipeline that does tagging and publishing on demand. This could then still be put on a cron job that auto-publishes every so often (2 weeks?).

Appreciate if anyone has additional thoughts on this!

ssbarnea commented 1 year ago

GHA workflow scheduled release, not cron running under someone's desk. It even has cron syntax!

jorisroovers commented 1 year ago

GHA workflow scheduled release, not cron running under someone's desk. It even has cron syntax!

Yes! That's what I meant :)

ssbarnea commented 1 year ago

@jorisroovers Sorry, I had to ask,... you never know what people really do these days.

RonnyPfannschmidt commented 1 year ago

in iniconfig i use https://github.com/pytest-dev/iniconfig/blob/main/.github/workflows/deploy.yml#L43 to ensure test-pypi uploads

RonnyPfannschmidt commented 1 year ago

alternatively a version scheme that leaves out the local node on master/main could be added as well

jorisroovers commented 1 year ago

alternatively a version scheme that leaves out the local node on master/main could be added as well

Yes, I was thinking about this but doubted a bit whether that was a clean solution and whether any breaking scenarios exist. Also would need to have a look at how to override the hatch-vcs versioning scheme, not sure if that can be done without resorting to too much patching and band-aids.

Might explore this now though, thanks for the suggestion 🙏

RonnyPfannschmidt commented 1 year ago

If the preexisting version schemes don't fit a new one can be added upstream or as plugin

jorisroovers commented 1 year ago

in iniconfig i use https://github.com/pytest-dev/iniconfig/blob/main/.github/workflows/deploy.yml#L43 to ensure test-pypi uploads

This was a great tip, thanks @RonnyPfannschmidt!

I ended up using it to implement the publish-release workflow that now gets triggered automatically on every commit to main.

After the gitlint and gitlint-core are published to test.pypi.org, we also automatically run integration tests (test-release workflow) on the newly published packages.

Both the publish-release and test-release workflows can also be triggered manually.

After working out a few issues (e.g. PYPI mirror replication delay), this has been running smoothly now for the last few commits.

I think I’ll switch to publishing to the regular PyPI (i.e. pypi.org, not test.pypi.org) in the near future.

This outcome is exactly what I wanted, very happy with the end result here! Thanks everyone for your input (and keep any suggestions coming)! 🫶

jorisroovers commented 1 year ago

I think I’ll switch to publishing to the regular PyPI (i.e. pypi.org, not test.pypi.org) in the near future.

Ok, that’s in effect now 🙂 https://pypi.org/project/gitlint/#history

Topic switch: hatch environment dependencies and dependabot

@ofek, something I’d like to have is dependabot support for dependencies in hatch environments, e.g. hatch.envs.test.dependencies.

https://github.com/jorisroovers/gitlint/blob/7b0c255de3c3f51c62d8a182d7b1994450acf888/pyproject.toml#L83-L92

I’ve read https://github.com/dependabot/dependabot-core/issues/3290, but AFAICT, that only pertains to top-level dependencies and optional-dependencies (which works well for gitlint). I couldn’t find any existing issue for environment-level dependency support within the hatch or dependabot projects, although I’d be surprised if I’m the first one asking?

I also had a quick glance at hatch-requirements-txt with the idea of splitting the test dependencies back out in a test-requirements.txt file so dependabot can pick them up. However, this feels like a step backwards and I think hatch-requirements-txt actually only supports project-level dependencies too (so this wouldn’t work).

Is this something you think will eventually be supported, or are there workarounds available? Thanks!

jorisroovers commented 1 year ago

Ok, I've created https://github.com/pypa/hatch/issues/775 for my question on dependabot and hatch environment dependencies.

I'm closing this issue then.

For reference, the entire new development process is documented at https://jorisroovers.com/gitlint/contributing/

Thanks everyone for your input, was invaluable 🫶

webknjaz commented 1 year ago

in iniconfig i use pytest-dev/iniconfig@main/.github/workflows/deploy.yml#L43 to ensure test-pypi uploads

I like constructing an expected version variable at the beginning of my workflows. Merges to the main branch expect .dev versions, "publish requests" expect "public" versions and PRs+non-main branches allow the local bit. Then, when making the dists, I inject no-local into pyproject.toml and make Git pretend it hasn't changed (git update-index --assume-unchanged). To make sure various parts of the workflow are in sync, I assert for the version in the artifact names right away. As a bonus point, I use the sdist to run all (well, most of) the testing in CI using the tarball contents rather than a Git clone using https://github.com/re-actors/checkout-python-sdist. This lets me make sure that all the downstream testing bits are shipped in sdist.

@jorisroovers I've also made some comments in your older PRs since I've missed some of those.

jorisroovers commented 1 year ago

@webknjaz Thanks for all the comments and PR 🙏

Let me work through them over the next week or so!

jorisroovers commented 1 year ago

@webknjaz Thanks for all the comments and PR 🙏

Let me work through them over the next week or so!

Bunch of new issues were created from this:

464
465
466
467
468

Closing this particular issue again now, we can track progress in the others. Thanks!

ssbarnea commented 1 year ago

Thanks for all the work on this, it clearly looks far better now. While the lack of tox is a red flag for me, mainly because it makes harder to contribute to the project) I love the direction.

I am happy that my mk tool was able to stop the executable in the root and expose it as a test command. Still, even with this no venv management involved and no way to only run linting, no way to run tests with a specific version of python.

Extra kudos for using mkdocs!

jorisroovers commented 1 year ago

no way to run tests with a specific version of python.

Still on my wish list. I've spend a bit of time getting this to work with hatch, but I was using asdf for managing python installations which was causing some issues. I've switched to rtx more recently, would have to retry again with that.

Extra kudos for using mkdocs!

Have been for years 💙 - that far predates this issue :)

I'm planning to adopt Material for Mkdocs and modernize the docs as part of the current release cycle though!

jorisroovers / gitlint

Modernizing gitlint's build and test tooling: discussion #378

Background

Test running

Builds

Candidates

Background

Test running

Builds

464

465

466

467

468