pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.54k stars 3.04k forks source link

Clarify & document our CI strategy #7279

Open xavfernandez opened 5 years ago

xavfernandez commented 5 years ago

Of our combination of supported interpreters, OS and architecture we are currently only testing a few without clear strategy (cf https://github.com/pypa/pip/pull/7247/files)

The goal would be to come up with a bunch of rules like:

chrahunt commented 5 years ago

We also need to consider:

brainwane commented 4 years ago

@ewdurbin and @pradyunsg are going to be talking about this issue this week as part of the donor-funded pip work that we want to complete by about the end of May.

pradyunsg commented 4 years ago

Based on a bunch of discussions lately, I think that we seem to have concensus to:

chrahunt commented 4 years ago

Concretely for the "make it faster" task, I can walk someone through getting performance data on Windows and analyzing it like I had to for #7263. If we refine the tools and process then we'll truly have a lot of visibility into where time is being spent.

pradyunsg commented 4 years ago

@chrahunt 🙋🏻‍♂️

This sounds like it would also be a good process to document in some form - either in pip's documentation if it is very-pip-specific or as a blog post somewhere, for broader visibility.

duckinator commented 4 years ago

Adjacent to making the entire test suite faster, there's also the concept of "failing fast." Basically, you want run the tests which are more likely to fail first. This is usually tests related to code that was changed since the last test run.

With regards to that, it may be worth looking into using something along the lines testmon. I've not worked with testmon specifically, but I've heard good things about both the tool and the general approach it takes.

As a note, a prerequisite of this is to know that the tests aren't order-dependent. So it may be worth trying to use something like pytest-randomly (where it can be used, since it's only Python 3.5+) first, and resolve any problems that show up there.

pradyunsg commented 4 years ago

With regards to that, it may be worth looking into using something along the lines testmon. I've not worked with testmon specifically, but I've heard good things about both the tool and the general approach it takes.

The tricky thing with this, is that our tests invoke a subprocess of pip, which isn't very friendly for tools like this.

As a note, a prerequisite of this is to know that the tests aren't order-dependent.

We parallelize and isolate our tests extensively, so they're definitely not order-dependent. :)

webknjaz commented 4 years ago

The tricky thing with this, is that our tests invoke a subprocess of pip, which isn't very friendly for tools like this.

Could ask @tarpas for help. I think the problem I faced is that when it's combined with the coverage collection tools (https://github.com/tarpas/pytest-testmon/issues/86#issuecomment-383275146), the results are totally broken (it may show 2-3 times lower coverage than it is for real).

pradyunsg commented 4 years ago

One of the ideas I had in the shower today, was to have our tests be split across CI providers by OS. We'll run linters + "latest stable" interpreters first, then if they all pass, we'd run the remaining interpreters. One way we could split-by-OS would be something like:

Then, we can add the constraints of running the full test-suite at least once on:

I don't think it makes sense to make a distinction between unit and integration tests here, but we might get good fail-fast speedups, from needing the unit tests to pass prior to running the integration tests.

webknjaz commented 4 years ago

@pradyunsg I'd probably put linters to GH Actions too. If you want them to cancel tests, could think of something like hitting API for that...

webknjaz commented 4 years ago

Also, have you considered using https://bors.tech?

duckinator commented 4 years ago

Since @webknjaz mentioned it: I highly recommend https://bors.tech. I use it for the vast majority of my own projects, and it's caught innumerable semantic merge conflicts before anything got merged.

If y'all decide to use it, I'd be more than happy to help you configure it. :slightly_smiling_face:

pradyunsg commented 4 years ago

I spent some time looking and it doesn't look like any of the CI providers we're using have any plans to drop Python 2 support anytime soon.

Most notably, GitHub announced a change for Python 2.7, but that decision got reversed (https://github.com/actions/setup-python/issues/63#issuecomment-596450646).

No new timeline for when Python 2.7 will be removed, but expect a blog post or something similar once it's decided that it's time to remove it

pradyunsg commented 4 years ago

I've been experimenting w/ pip's CI setup in pradyunsg/pip#4 and pradyunsg/pip#5. Based on a whole bunch experimentation and trial & error, I think I have something nice that we can transition to right now. But first, some context which we should probably move into our docs as well eventually...


The number of parallel jobs we get per-CI-provider:

IMO, the best utilization would be to have:

Alas, we have failures on Windows + GitHub Actions, that I don't want to deal with immediately.


I think we should be running:

Other than that, I think we should have at least 1 CI job that runs unit and integration tests on:

I think it makes a lot of sense to group our "developer tooling" tasks as:


So, my proposal is:

GitHub Actions:
  MacOS / Packaging
  MacOS / Quality Check

  MacOS / Tests / 2.7
  MacOS / Tests / 3.5
  MacOS / Tests / 3.6
  MacOS / Tests / 3.7
  MacOS / Tests / 3.8

GitHub Actions:
  Ubuntu / Packaging
  Ubuntu / Quality Check

  Ubuntu / Tests / 2.7
  Ubuntu / Tests / 3.5
  Ubuntu / Tests / 3.6
  Ubuntu / Tests / 3.7
  Ubuntu / Tests / 3.8
  Ubuntu / Tests / 3.8-x86

Azure Pipelines:
  Windows (Packaging)
  Windows (Quality Check)

  # These tests would be split across 2 workers, to reduce end-to-end runtimes
  Windows (Tests / 2.7 / 1)
  Windows (Tests / 2.7 / 2)
  Windows (Tests / 3.8 / 1)
  Windows (Tests / 3.8 / 2)
  Windows (Tests / 3.8-x86 / 1)
  Windows (Tests / 3.8-x86 / 2)

  # These tests would be only unit tests
  Windows (Tests / 2.7-x86)  # you know... just in case.
  Windows (Tests / 3.5)
  Windows (Tests / 3.6)
  Windows (Tests / 3.7)

Travis CI:
  Ubuntu / Quality Check

  # These tests would be split across 2 workers, to reduce end-to-end runtimes
  Ubuntu / Tests / PyPy2 / 1
  Ubuntu / Tests / PyPy2 / 2
  Ubuntu / Tests / PyPy3 / 1
  Ubuntu / Tests / PyPy3 / 2

Based on my experiments, our "bottleneck" CI job would then be MacOS, for which we only have 5 workers (i.e. 1 PR at a time); but that looks like it isn't significantly different from our current affairs, where Azure Pipelines is a similar blocker since we run tests on all platforms there.

We can mitigate this in the future by moving toward the "best utilization" situation I described above, by swapping the CI platforms we use to run Windows & MacOS.

webknjaz commented 4 years ago

Alas, we have failures on Windows + GitHub Actions, that I don't want to deal with immediately.

But they are green in my old PR: https://github.com/pypa/pip/pull/6953

webknjaz commented 4 years ago

@pradyunsg did you think about wiring up Zuul resources in addition to this? Could test some less conventional distros this way (fedora, centos, debian etc.)

pradyunsg commented 4 years ago

But they are green in my old PR: #6953

It's just skipping those tests entirely. :)

https://github.com/pypa/pip/pull/6953/files#diff-2deae8ed35e0da386b702aa047e106cbR46-R47

did you think about wiring up Zuul resources in addition to this?

I did, yea. I didn't find most of their documentation approachable or any good "hey, here's how you can get started" document or article. Before someone asks, I also looked at bors; and while it would make sense for us to use it, I don't want to bundle that up with these changes as well.

Overall, I figured we should clean up what we have before adopting yet-another-CI-setup. :)

Could test some less conventional distros this way (fedora, centos, debian etc.)

I spent a whole bunch of time thinking about this, and... no, I don't think we should be testing against these platforms on our CI. Most linux distros are basically the same from vanilla pip's PoV, and... the responsibility for making sure pip works on a specific linux distro lands on the distro's maintainers; not pip's maintainers.

xavfernandez commented 4 years ago

I think we should be running:

* developer tooling (all our tox commands) on all platforms

Agreed (i.e. latest CPython x64 interpreter) :+1:

* unit tests on supported Python interpreter versions on all platforms

I read that as "unit tests on all supported Python interpreter versions (including PyPy) on x64 arch on all platforms" ?

* integration tests on supported Python interpreter versions on Linux+MacOS

I read that as "integration tests on all supported Python interpreter (including PyPy) on x64 arch on Linux+MacOS" ?

* integration tests on the latest CPython and PyPy (both 2 and 3) on Windows (coz slow)

"integration tests on the latest CPython 2 & 3 and latest PyPy 2 & 3 with x64 arch on Windows"

Other than that, I think we should have at least 1 CI job that runs unit and integration tests on:

* latest CPython 3, x86 (32-bit) on Linux

* latest CPython 3, x86 (32-bit) on Windows

:+1:

* PyPy 2

* PyPy 3

This is unclear as it seems to be included in the previous

I think it makes a lot of sense to group our "developer tooling" tasks as:

* Quality Check = lint + docs

* Packaging = vendoring + dry-run-of-release-pipeline

:+1:

pradyunsg commented 4 years ago

@xavfernandez Yea, I intended to write CPython, and basically only have 1 PyPy job. I've gone ahead and edited the message to correct that error. :)

This stems from the fact that our tests on PyPy are really slow and I'd like to get away with not running tests on it, in too many places. FWIW, I'm going to do these changes incrementally, so there's no reason I can't experiment w/ trying to get PyPy tests on Linux + MacOS working fast-enough. :)

webknjaz commented 4 years ago

FTR Travis CI now has support for a number of untypical architectures. It could be a good idea to have some smoke tests there while the main computing power would be elsewhere...

pradyunsg commented 4 years ago

It looks like we can simplify our CI pipeline a lot.

Per https://github.com/pypa/pip/issues/9087, we're going to drop Travis CI.

GitHub Actions gives us 5 MacOS workers, and 20 Linux/Windows workers. Once we drop Python 2.7 and Python 3.5, that makes it very reasonable to plonk down our entire "regular" CI pipeline into GitHub Actions.

https://github.com/pradyunsg/pip/pull/8 has an implementation of this idea, with all tests passing after Python3.5 and Python2.7 are dropped.

pradyunsg commented 4 years ago

Oh, and that's on 1 CI provider: all the tests on all the platforms with all the CPythons + Ubuntu PyPy3, in about 30 minutes.

I quite like it and am very tempted to move fast on this once 20.3 is out. :)

xavfernandez commented 4 years ago

If we don't have issues with Azure Pipelines and since it is already setup, why not use it ? We should be able to dispatch the test suite in more than 25 workers.

pradyunsg commented 4 years ago

One good reason is release automation - if the entire pipeline is on one provider, we can use "depends on" and "if" based stuff, to dispatch a deployment that's only run if all the tests pass, and if we've got a tag pushed by the right people.

Kinda tricky to do cross-provider.

webknjaz commented 4 years ago

A tool like Zuul can solve cross-CI dependencies FWIW

pradyunsg commented 4 years ago

There's a very simple problem with Zuul, as far as I'm concerned.

I didn't find most of their documentation approachable or any good "hey, here's how you can get started" document or article.

pfmoore commented 4 years ago

I just went and had a quick look at the Zuul docs, and I agree. I had no idea where to start.

I'd be pretty cautious about adopting extra tooling, we need to be sure we don't end up dependent on one person being the only one who can manage our infrastructure. (I was going to say "a limited number of people" but given the size of the pip team, that seems redundant 🙁) I'm not even sure how many people can manage our existing bots, for example.

Keeping things simple, and using well-known and commonly used tools, should be an explicit goal here, IMO.

webknjaz commented 4 years ago

Yes, the UX side of it is rather terrible. But folks who get used to it are happy. Basically, there's some providers that run it on their hardware, and for GitHub each of them has a GitHub App for integration. So you'd have some provider giving you the platform and usually they also help with configs. That is because different platforms may have different envs. And also Zuul uses Ansible for the configuration, meaning that it's extremely flexible within limits of what types of OSs/hosts the provider has. Look at the cryptography's configs:

Here's how the PR runs are reported: https://github.com/pyca/cryptography/pull/5548/checks?check_run_id=1345075288

I know @mnaser mentioned that Vexxhost would be open to providing a platform, and also there's OpenDev that cryptography uses. I think @ssbarnea may have more info about the opportunities.

pradyunsg commented 4 years ago

I'm not even sure how many people can manage our existing bots, for example.

Donald wrote them. Ernest is managing the PSF hosting. I got access for being a whiny kid.

Currently, the code at pypa/browntruck is auto-deployed at each commit.

pradyunsg commented 4 years ago

Yes, the UX side of it is rather terrible

Honestly, this is reason enough to avoid it then. 🤷🏽

Zuul can solve cross-CI dependencies FWIW

If there's an example of this, I'd like to see that.

bristea commented 4 years ago

the responsibility for making sure pip works on a specific linux distro lands on the distro's maintainers; not pip's maintainers.

^ Ouch, that sounds very bad. This means that if all cores do happen to like foo-distro, pip would likely work only on foo, even if a survey made could show that number of users using the tool on foo-distro is... statistically insignificant.

I was hopping that we are all here with a common goal of improving python packaging user experience, not finding reasons for avoiding having a good test coverage. A good test coverage is about where people are using the tool in the wild (as opposed of where a selective group is using it). Testing less and claiming that everything is fine is not a good approach.

brainwane commented 4 years ago

Hi, @bristea! I think we are all indeed trying toward the same goal but have concerns about how to achieve it sustainably, especially given that the grant funding for pip work runs out at the end of this year. Will you be committing your own time to maintain any new parts of pip's test infrastructure, or donating money to fund others to do so?

ssbarnea commented 4 years ago

Yep, I do happen to use several Zuul servers daily, most notable ones being the OpenDev (OpenStack), Ansible and RDO ones.

I have to disagree with @webknjaz about zuul ui being terrible, I would describe it as just a little bit behind gh-actions or travis but improving each day. Lets be pragmatic and look at those already using it, https://github.com/pyca/cryptography/pull/5548 -- The downside that I see is that it reports as a single check but other than this nothing that would make impossible to identify what caused a potential job failure. In fact it does allow user to look a history of each job and identify if a failure is random or when it started to appear, much easier than on other systems.

It is quite common for sensitive projects to use multiple 3rd-party CI systems, which are usually not managed by the same team developing the test project. Even the jobs are defined outside but they all report in the same place. Most of the time the 3rd-party integrations are non-voting (a job failure would not prevent a merge).

Being offered such help is like throwing money at you, running CI is costly (not only compute) and if someone is offering to help you with hardware and also maintenance of jobs definitions, one should not say no.

pip is probably one of the first projects that you do want to cover with a big test matrix, one covering major linux distributions and multiple architectures.

Shortly, I am offering to maintain the pip zuul jobs running. Maintaining CI job for OpenStack is already my main job so keeping pip ones maintained would be very easy, especially as almost all projects under opendev directly depend on pip. Any pip regression hits us hard so we have a very good incentive in assuring this does not happen.

brainwane commented 4 years ago

@ssbarnea that is wonderful -- thank you very much for the offer!

The rest of this somewhat lengthy comment is meant for people who perhaps aren't as familiar with pip's day-to-day context and history.

@bristea: In case you are unfamiliar with the current funding situation: the PSF was able to get some funding, $407,000 USD in total, from Mozilla Open Source Support and the Chan Zuckerberg Initiative to hire contractors to work on the pip resolver and related user experience issues in 2020. You can see our roadmap and blog and forum and mailing list posts and notes from recent meetings to keep apprised. We also post updates to distutils-sig and the Packaging forum on Python's Discourse instance.

Prior to late 2017, nearly no one was paid to work on any part of Python packaging -- PyPI, pip, or any other tool in the chain. Here's what it looked like in 2016. The Python Software Foundation was able to successfully apply for a few grants and similar funds over the past 3-4 years, which is why the new pypi.org is up, why it has two-factor auth and audit trails, and why pip has a new dependency resolver. Along the way we've been able to shore up some of our related infrastructure, and, for instance, pip's automated test suite is stronger than it was before our current resolver work started. And Bloomberg is paying Bernat Gabor to work on virtualenv, and Dustin Ingram gets to do a little packaging work while paid by Google, and Ernest W. Durbin III does sysadmin and some code review work on Warehouse (PyPI) as part of his work at PSF. But that's nearly it, I think. We are working assuming that, starting in January 2021, practically no one is being paid to contribute to pip. And so new problems that crop up with testrunners, CI configuration, etc. will have to wait till someone can fix them in their spare time, and will block the few volunteer hours that maintainers have available to do code review and merging, much less feature development. This is why Sorin's offer is so welcome!

@bristea, you were replying to what @pradyunsg said in this comment where Pradyun was specifically considering the question/problem of testing "some less conventional distros this way (fedora, centos, debian etc.)". You suggested this would be a problem if statistics showed that the main supported platform was statistically insignificant in terms of proportion of pip's user base. Yes, it would be a problem if, for instance, pip maintainers concentrated on Ubuntu support to the detriment of Fedora support, but then it turned out we had far more users on Fedora than on Ubuntu! I hope you will take the 2020 Python developers' survey and sign up for user studies on Python packaging and spread the word about these efforts, so we have a better assessment of what operating systems our users use. And then that data will help pip's maintainers decide how much of their scarce time can go into support work for various platforms.

webknjaz commented 4 years ago

Yes, the UX side of it is rather terrible

Honestly, this is reason enough to avoid it then. 🤷🏽

The word I used is probably too strong. The problem is that there's a huge openstack bubble that just got used to how things are. UI is usable but it is often a bit more sophisticated than what GH/Travis users are used to. Truth be told, it's possible to customize that UI too and this is what probably causes the perception that the UX is bad. I guess if folks from a different bubble set up things like they like, it doesn't mean that it's that bad. OTOH since we don't see any setups using something more familiar, it creates a wrong impression of how things work...

I don't have examples of Zuul dep configs myself, I just know that it's possible. Maybe Sorin has better demos.

One notable thing about Zuul is that you can declare cross project PR dependencies that are agnostic to where the projects are hosted. For example, if some project on OpenDev depends on a bugfix PR in pip, they can specify such a dependency and Zuul will make sure to trigger the build on that OpenDev project once pip's PR is green. But of course it can follow way more complex dependencies.

SeanMooney commented 4 years ago

for what its worth i think the ux is subjective. haveing your pipline defiend declaritvly in code using ansible for the job logic provide a different configuration in code based workflow that many coming form a jenkins backround find unituitive.

i have worked on openstack for over 7 years now and when i started we used jenkins to execute the job. in comparison to the ux of that solution zuul is much much better. with that said you are comparing it to travis azure and github actions. i have not really worked with those but from what little i have seen of travis i think the ux of travis and zuul are more or less the same.

the main delta with regard to execution from a contributor point of view would be when the ci runs. with zuul it would run when they open the pull request to the main repo adn when they push updates, where as with a travis file unless you limited to PRs or were using a paid account that was limit to the offical repo, it would run when they push the change to there fork before they create teh pull request. im not sure if it would run agian for the PR in the case where it runs on the fork i honestly have too limited of an experice with travis to say.

its something that would neeed to be discussed with the opendev team but i know that they provide thrid party ci for the ansible project today and the cryptography module adding . pypc has its own tenant in the opendev zuul today

https://github.com/openstack/project-config/blob/master/zuul/main.yaml#L1645-L1667

to support pypa we would also need to create one for pypa or add it to the same one used form pyca or opendev.

a seperate tenant is likely the best way to integrate as the permissions could be scoped more cleanly but in general it not that diffuclut to do.

i dont work for the opendev foundation or on the infra team so i cant really speak on behalf of them regardign if they are willing to provide the resouce to run the ci but to me as an openstack contibutor it would make sense given our depency on pip working for openstack to install properly and for our own ci to function.

this is a bit openstack specific still but https://docs.opendev.org/opendev/infra-manual/latest/creators.html documents many of the steps require to add new projects. the gerrit and pypi sections wont apply as you wont be using gerrit for review or having zuul publish packages to pypi(unless you want it too) on your behalf but it has some good references

https://docs.opendev.org/opendev/infra-manual/latest/testing.html descibes the type of ci resources avaiable.

basically it boils down to vms with 8G ram 80GB disk and 8 vcpus there are a number of os image avaiable and others could be added if needed. the "CPUs are all running x86-64." is only mostly true. in general yes they will be x86 vms unless you use special node sets that have been recently added to run some limited arm based testing.

pyca are using opendev to build both x86 and arm wheels for example https://zuul.opendev.org/t/pyca/build/5b6057b33cf74db7aa1e2b2a90ec033c https://zuul.opendev.org/t/pyca/build/fbbcefe49659465eb5b3dc48bf2f0ef9

you can see that on there build page https://zuul.opendev.org/t/pyca/builds these are cross linked to the github PR so if you click on the cange linke then it bring you back to github https://github.com/pyca/cryptography/pull/5551 and if you check the cheks tab you can see the output reported. https://github.com/pyca/cryptography/pull/5551/checks?check_run_id=1352176738

if you go to the artifacts tab you can download the build output https://zuul.opendev.org/t/pyca/build/5b6057b33cf74db7aa1e2b2a90ec033c/artifacts

and if you wanted too as a post merge trigger you could write a job to rebuild/upload those artifacts to an external site. normally you only want to do that once you tag the release but we do host all of the branch and release tars/wheels togeteher for openstack e.g. https://tarballs.opendev.org/openstack/os-vif/

anyway im not sure if this is helpful or not but just tought i would provide a little feedback as someone who has workd happliy on porject using zuul and who had deployed it before for third party ci. im obviously biased in favor of it but hopeful some of this is useful.

pfmoore commented 4 years ago

anyway im not sure if this is helpful or not but just tought i would provide a little feedback as someone who has workd happliy on porject using zuul and who had deployed it before for third party ci.

Thanks, that's interesting. If I'm understanding what you're saying, it sounds like most of the setup could be done independently of the pip repository, at least in the first instance, maybe just running a daily CI run against master as a proof of concept. If it proved useful (and the people setting it up and running it were willing to take on the commitment) then we could link it to the pip repo so that it runs on PRs and merges, initially as an optional check, but if it turns out to work well, we could make it mandatory.

I'm very much in favour of something like that where we can try the approach incrementally, with the pip developers able to leave the specialists to get on with the basic infrastructure, and without having the work be bottlenecked on pip developer availability.

If I've misunderstood what you were saying, then I apologise (and you can ignore this comment).

SeanMooney commented 4 years ago

yes using zuul is not an all or nothing proposition pyca/cryptography are using it in addition to travis and azure for things they cant easily test in another way. in there case it was arm builds i believe.

assuming open dev are happy to provided the resources you could totally start with a simple nightly build. that said openstack and pip are two very different thigns to test.

openstack to test it properly need to deploy an entire cloud with software defiend storage and networking and boot nested vm in our bigger integration jobs. we also have a lot of smaller jobs that just run tox to execute unit tests or sphinx to generate docs.

so zuul can handle both well because its just runing ansible playbooks against a set of resources provided by an abstraction called a nodeset.

if github action or travis do what you need less complexity is always better then more. if there are things that are hard to do with travis or github actions that zuul could address then that is where it would provide value.

openstack has a lot of project with a lot of interdependency and a need for cross gating and very large scale. zuul was built for that but that may or may not be what you need for pip.

ianw commented 4 years ago

I setup the Zuul integration with pyca, where our initial focus was to enable ARM64 testing and manylinux wheel building on the ARM64 resources provided by Linaro. OpenDev is not really looking to become a TravisCI replacement where we host 3rd-party CI for all comers. However, there are projects where there is obvious synergy with collaboration -- obviously OpenDev/OpenStack heavily depends on pip in CI and I think it's fair to say as a project we have historically found and done our best to help with issues in pip/setupools/etc. pyca was our first integration, and I have an action item to write up much clearer documentation.

I'm very much in favour of something like that where we can try the approach incrementally, with the pip developers able to leave the specialists to get on with the basic infrastructure, and without having the work be bottlenecked on pip developer availability.

We don't really need to keep talking theoretically around this; we can get some jobs working in a pull request easily and the project can evaluate how it would like to continue based on actual CI results.

However, both sides need to agree to get things started:

Both projects taking these steps essentially formalises that pip is open to integration, and the OpenDev project is willing to provide the resources.

With this done, we can start a proof of concept running jobs in a pull request. It would be good to confirm @ssbarnea and @SeanMooney are willing to help setup some initial jobs; it's only going to be useful if the CI has something to do! Pip can can see exactly what the configuration and results will look like on that pull request and make a decision about how to move forward.

I can give you a heads up of what it will all look like though. The job definitions live under a .zuul.d directory [1]. Zuul will report via the checks API so the results of the jobs just show up in the list like any other CI, e.g. see the checks results on a pull request like https://github.com/pyca/cryptography/pull/5533, where the run results are posted as https://github.com/pyca/cryptography/pull/5533/checks?check_run_id=1348557366. When you click on a job result, it will take you to the Zuul page where all logs, build artifacts, etc. are available, e.g. https://zuul.opendev.org/t/pyca/build/b7056847728149c18ca3a483d72c1a51. This played out in https://github.com/pyca/cryptography/pull/5386 for pyca where we refined things until it was ready to merge and run against all PRs.

[1] technically we do not need to have any job definitions or configuration in the pip repository; we could keep it all in OpenDev. However, this means if you want to modify the jobs you have to go searching in a separate repository but, more importantly, this would mean that pip developers can't modify the jobs without signing up for an OpenDev account and being given permissions to modify the jobs there. This is not usually the approach projects want to take; they want their CI configuration under project direct control.

webknjaz commented 4 years ago

Folks, since we sorta hijacked the issue dedicated to the docs and it'd be great to keep it on topic, I've created #9103 to discuss Zuul effort there. Let's use it for that from now on.

pradyunsg commented 4 years ago

With all the Zuul stuff redirected over to #9103 (thanks @webknjaz!), I'd like to get the last bits of #2314 done.

Here's my plan: all of pip's current CI moves to GitHub Actions, we deploy to PyPI {when conditions are right -- #2314} and we add a GitHub Action triggered at-release over on get-pip.py, to get get-pip.py CI updates automated. :)

That should let us push a tag and have the release go out automagically.

pradyunsg commented 4 years ago

To address what @bristea said earlier:

^ Ouch, that sounds very bad. [later] pip is probably one of the first projects that you do want to cover with a big test matrix, one covering major linux distributions and multiple architectures.

I don't think it's bad.

As I said in the comment that was quoted -- for pip, most Linux distros look the same. We're not testing against the distro-provided Python (the distros perform their own testing on them, because they have their own policies + they're patching Python+pip anyway) and so there really isn't much to gain by adding additional distributions. Testing multiple architectures also isn't super impactful -- pip is a pure-Python program and so are all of our dependencies. The "main" bit that's architecture-dependent is packaging.tags and even that's not a part of pip's codebase. In other words, I think the benefits of adding additional architectures is limited and has sharply diminishing returns -- though I don't think we're even close to that point yet.

On the other hand, back when I wrote that, our CI used to take well over an hour to run tests for our existing test matrix per PR (if I remember correctly). That sort of feedback timeline basically KILLS productivity -- it'd suck in general for any software project but especially for a volunteer driven one like pip. I used to push a change and go do something else for an hour, simply because that's how long it took to get a CI result. As @brainwane noted, a fair amount of work has gone into getting those times down and bringing sufficient clarity to the CI (especially when there's failures). And even then, the CI times are in the push-and-go-have-a-meal territory, despite using RAMDisks on certain platforms and a multi-CI-provider approach to maximise the "workers" we use.

The trade-offs are CI times vs CI support matrix size. It'd be amazing to have short times and large support matrices but the reality is that we don't have buckets of money earmarked "pip stuff" going around. [1] Adding more stuff to the CI matrix would only make the time situation worse, unless we get additional CI resources -- and as the person who has to wait, I obviously don't like that. ;)

Which brings me to things like #9103 -- I'm very on board for so much more of this. External CI resources provided/donated by organizations with an interest in ensuring pip works well. As I've used a lot of words to say above -- right now, we're really hard-pressed on the CI resources situation and we could really use additional CI resources, to increase our CI matrix as well as to improve the developer experience.

[1]: If you know someone/some organisation that'd be willing to do so, please do let us know. I'm sure PSF's Packaging-WG will figure out some way to put it to good use. As an example related to this issue, ~2-3 weeks of an experienced Python dev working to improve our test suite significantly, allowing for faster feature development and better sustainability for the project. Also, we've got a list for even more impactful projects, if that's more interesting. :)

ssbarnea commented 4 years ago

Coming with a good number of years of contributing to openstack, I find if really funny that 1h delay to be sound long. On OpenStack we do have cases where it takes even more than 24h for particular change to be checked or gated, is not common but we have jobs that run for 2-3h, with empty queue.

I think is a bad development practice to optimize for time to pass CI by lowering the test matrix. Developers should have patience with their patches and also perform a decent amount of local testing before they propose a change.

The bigger the risks the bigger the test matrix should be and I do find pip as one of the most important projects in python ecosystem. If a bug slips in that affects even 0.1% of users, that is serious issue.

So please do not advocate for quick merges. The reality is that having a patched reviewed by humans, preferably at least two requires far more time than the CI, so the time to run CI is not the real bottleneck most of the time.

Also, i do find quite dangerous to have merges happening too fast it it does not allow others to review them. In fact I would personally wish github would have a configurable cool-down time which project can configure, preventing merges unless they are at least a number of hours passed.

uranusjr commented 4 years ago

The main problem I have with long running CI checks is that it causes wasted cycles. I work on a lot of projects concurrently, and tend to completely switch to another task after I finish working on pip, and would not check back in a long time. I believe most pip maintainers work the same way as well. This means a failing CI tend to make the code miss one precious oppertunity to get reviewed, and have to sit in the queue for considerably longer time than needed. If the CI could report more quickly, I would be able to afford waiting some extra time before switching task to avoid the PR from dropping out of the cycle.

It may be counter-intuitive, but IMO long CI duration is a problem with pip exactly because pip PRs tend to need more time to get proper reviews, not the other way around. Projects with more review efforts can afford longer CI duration because missing some of the review oppertunity is less problematic. I am not faimilar with OpenStack and do not know how it compares to pip, but “long CI is not a problem since human reviews take longer” does not seem to be the correct conclusion to me.

pradyunsg commented 4 years ago

As a general note, I do feel like everything that has to be said here in terms of how everyone involved feels about $thing and $approach has been said.

Instead of an extended discussion about the trade offs at play here, I’m more interested in breaking this issue into a list of action items, making dedicated issues for those and closing this.

If someone else would like to get to making those action items before I do (at least 2 weeks from now), they’re welcome to!

pfmoore commented 4 years ago

Here's a starting list:

  1. Document the types of CI we want. Things like Primary (must pass before merge, tests functionality against supported platforms), Secondary (must pass, tests additional things like documentation, linting), Optional (informational, not required for merge - new in-progress feature tests, non-core platforms, maybe), Supplementary (additional platforms).
  2. Document the precise platform/interpreter matrix we want to test against. Categorise each combination as mandatory or optional. Link that to the supported platforms matrix.
  3. Document how (if at all) we expect to isolate ourselves from CI infrastructure details - the point was raised at some point that we run CI almost entirely on Ubuntu flavours of Linux, because that's what CI offers. This (hopefully!) does not mean that we risk missing bugs specific to other Linux distributions, but we should be clear on why it doesn't, and what we do to ensure that. The same point applies to Windows and MacOS, of course, in terms of OS versions.
  4. Document our policy on CI vendors (do we want just one, multiple, do we want redundancy or independent test sets).
  5. Document how CI is maintained - who will handle maintenance for each CI vendor, what the route is for raising issues, etc.
  6. Document our policy around the test suite - how do things like tests marked as "network" work, do we handle them the same way on every CI vendor, how do we handle issues like unreliable tests (which may be linked to CI infrastructure differences).
  7. Write up our priorities - fast test runs, completeness of testing matrix, ease of reading test output, ability to do adhoc tests on the CI environment... (At the moment, we mostly don't have such priorities - we just put up with whatever our chosen CI vendors provide. The discussion around zuul has exposed that fact, and we need to understand what we want in order to evaluate zuul).

That's a 5-minute brain dump of high-level, but hopefully small enough to be actionable, ideas.

I don't have the time to manage any of these items, so I'm just throwing them in here in the hope that someone has the bandwidth for this sort of meta-activity.

pradyunsg commented 3 years ago

GitHub Actions gives us 5 MacOS workers, and 20 Linux/Windows workers. Once we drop Python 2.7 and Python 3.5, that makes it very reasonable to plonk down our entire "regular" CI pipeline into GitHub Actions.

And, this is done now with #9759. You have been warned about lots of issue tracker churn this weekend. :)