Open dstufft opened 2 years ago
So my thoughts on trying to improve the above things:
I'm going to roughly distill things down into three general root causes:
The main problems caused by this from my list above is:
This basically comes down to trade offs and normally I'm on the side of multi/poly repos and think that the problems that come along with monorepos outweigh the benefits. However, I've been thinking about this a lot for PyPI, and I actually think that here we have a really strong case that things would be better if we switched to a monorepo.
Just to spit ball things a bit, here's a rough sketch of an idea for that:
All of the services, libraries, and related repositories that go into producing the entirety of what we would consider "PyPI" that are under our control should all be folded into a singular repository. This would include services like Warehouse, Linehaul, Conveyor, Inspector, libraries like readme_renderer and trove-classifiers, and misc supporting things like the terraform in pypi-infra. It would not include anything that isn't roughly developed primarily for use in or with PyPI by the PyPI team, so things like packaging would not be expected to be included.
We would primarily use a code structure that looks like
. ├── external ├── infra ├── libs │ ├── internal │ │ └── hypothetical-feature-flags-lib │ └── shared │ ├── readme-renderer │ └── trove-classifiers └── services ├── conveyor ├── linehaul └── warehouse
Within this code structure:
- The
external/
directory houses any external code (or any references to external code such asrequirements.txt
files) that we've imported or referenced into the repository.- The
infra/
directory to contain the "base" terraform infrastructure files (but not any modules or service specific stuff other than- The
libs/internal
directory contains any internal libraries that exist for us to share among our services, but that would not end up being published to PyPI.- The
libs/shared
directory contains any libraries that we've developed for use in or with PyPI that we release to PyPI as well for others to use.- The
services
directory contains any services that we've got, the primary one being Warehouse itself of course.The general rules would be:
- Nothing can import from a service (other than that specific service itself)
- Nothing can import from the top level infra directory.
- Everything can import from externals.
- Everything can import from a shared lib.
- Services and other internal libs can import from internal libs.
- Infra can import from
services/*/infra
.Inside each services directory would essentially be a "project root", which would have things like (but not limited to):
- A
`infra/
folder that contains any related terraform configuration or modules for this service (which the top levelinfra/
imports)- A
src/
directory containing the source code for that service.- A
tests/
directory containing the tests for that service.- Any deployment files that Cabotage needs to deploy this service (
Dockerfile
andProcfile
still?).The basic idea being that the
servicies/$NAME
directory is roughly equivalent to the current service specific repositories other than things that we lift to the top level for uniformity (for example, I'd imagine black configuration to be top level).The same sort of idea is true then for the two libs directories, with external being the odd one out of being more free form since it will have a lot more one off things in it, which will hopefully mostly be references to external things rather than fully importing external things.
We would then shut down all other repositories and all issues, pull requests, deployments, etc and funnel everything through this one repository.
This would have a lot of benefits for us:
HEAD
versions of any library that is developed as part of this monorepo so you never have to jump through hoops to test changes for our own libraries inside one of our services.external/
[^2] which means that we lash all of our related services together, upgrading Python for one will upgrade Python for all of them in one atomic unit which will help prevent services slowly accumulating tons of out of date packages, possibly with security issues [^3].libs/internal
directory.We could also add some other directories if we can think of good uses for them, like a cmd/
directory that just exists to make little one off commands or something. Really any sort of categorization we find useful we can add here.
[^1]: Mostly? I think we'd still keep the pypi-support issue tracker. [^2]: I'm sure we'd need the ability to have other versions sometimes, but those should be special cases. [^3]: This "lashing together" can also be considered a con, since it means upgrading a dependency in Warehouse is blocked on also upgrading that dependency in Conveyor. Ultimately it's a trade off that if its all together its typically not super hard for a small set of services to upgrade it all together, and it acts as a forcing function to make sure that we actually do it.
This is where the bulk of the issues ultimately fall under I think, with things like:
Ultimately all boiling down to the fact that our build system is currently a combination of docker, docker-compose, make, shell scripts, pip, npm, gulp, so to fix them will require replacing some or all of them.
I don't have an exact idea of which system to pick, but I think ultimately the right thing to do here is switch to one of the more modern build tools like Bazel, Pants, Please, etc.
They all work somewhat differently and have their own pros/cons over each other and I don't want to get into trying to select a specific one right now, but in general these tools:
These systems often don't, by default, provide any better editor integration than our current system does. However since things can run outside of a docker container while still being pulling in requirements as part of the build system rather than through the host, it means that it's a lot more tractable to write editor integration that just installs everything to some well known path that editors can be configured to look inside as the environment.
The main downsides to switching to something like this are:
docker-compose
itself expects your build system for locally built images to use Dockerfile
based build systems, but these systems typically want to work by letting the build system produce the docker image without the use of a Dockerfile
. You can work around this though:
a. Just make a stub Dockerfile
that just invokes the new build tool to do building.
b. Use docker-compose just for things that you aren't building locally (e.g. third party services) and just run the services from the repository natively using the build tool.
c. Use the build tool to produce images, and just have docker-compose
use that image as if it were a third party image which requires having the image built before starting docker-compose
but which you can write a wrapper around or provide some integration to make more pleasant to deal with.The main problem caused by this is:
For this I think the answer is that we just need to rethink how we manage our dependencies, what I would suggest is something like:
We have multiple
requirements/*.in
files that specify our top level constraints for various files and this is the only valid way to specify dependencies (e.g. no more barerequirements/foo.txt
). We then usepip-compile
and pass all of theserequirements/*.in
files into it as input in a single pass and produce a singlerequirements/constraints.txt
.This
requirements/constraints.txt
contains the entire set of possible dependencies along with pinned versions and hashes, but since we don't know which ones of them we actually want to install just based on that file alone, we don't actually pip install from that file, but instead we use it a pip constraints file and use the original individualrequirements/*.in
files as our requirements files.We then disable Dependabot, and create our own cron job (Github Action?) that will on some schedule just recompile the whole suite of
requirements/*.in
files into the singularrequirements/constraints.txt
and submit that as a single Pull Request.Skip the dependency check unless
requirements/*
has changed, and remove the attempts to mitigate things by parsing the outcome and only partially diffing the data, just do a basic diff.
This ends up solving a few of our biggest problems with dependency handling:
.in
files as requirements files and using the constraints file.pip install
commands and still get the same, valid, outcome.It does have a few downsides that I can think of:
requirements/*.in
with !=
or <
)Of the three ideas, the third one is largely wholly independent, we can implement it no matter what we do since it's just related entirely to our own tooling and we'd be implementing it our selves.
The first two ideas kind of go hand and hand together somewhat.
You can of course do a monorepo without one of the more modern build tools and you can use one of the more modern build tools without bothering with a monorepo, but doing so is kind of going even further off the beaten path than doing them together and a lot of the benefits of the two things interplay with each other in positive ways.
That being said, we obviously could do a monorepo without changing our build tool stack, and use a similar directory layout and it would work, it's just likely to cause a lot of the problems that the build tool change tackles to get somewhat worse.
Likewise we could switch to one of the mentioned build tools without the monorepo, but most of the upfront investment that you're required to do either way exists to make the dependency tracking work correctly, which has a lot less utility (though not zero!) when you're still doing polyrepos and what utility you retain, you can likely get using more off the shelf tools for Python/pytest/etc already.
Also, none of the above really goes too deep into specifics or figured out what a migration plan would look like, or even really exists as a proof of concept for people to try.
Mostly I'm trying to generate some discussion about these ideas and see how other people feel! If these sounds like directions people would be interested in exploring, then I'm going to continue to work on fleshing out more solid proposals and a proof of concept of all of this. If people see these ideas and just immediately get hives, then maybe we can figure out a different way of fixing these issues!
FWIW, Dependabot does support pip-compile natively: https://github.com/dependabot/dependabot-core/blob/main/python/lib/dependabot/python/file_updater/pip_compile_file_updater.rb (https://github.blog/changelog/2021-01-19-dependabot-pip-compile-5-5-0-support/). However, it only supports going from a single .in to a single .txt however (https://github.com/dependabot/dependabot-core/blob/70805187fb63ff1f012446bd32a3d22a6220cc43/python/lib/dependabot/python/file_updater/pip_compile_file_updater.rb#L458).
If I understand things correctly, having a single requirements/constraints.in
that -r ...
for the other "scoped" files, and compiles it into a requirements/constraints.txt
would satisfy the workflow you're seeking for dependency management, while still being compatible with Dependabot.
Hmm, guess they just don't document that at all then
Although I did find out that the workflow I proposed doesn't actually work with pip, see https://discuss.python.org/t/constraints-files-with-hashes/18612/
Whoa, this is quite the brain dump! I read through it once, but I'm certain I missed some stuff, so apologies if I overlook some specifics.
As someone who's self-onboarded last year to develop this codebase, I've got some learnings to share! Sorry this isn't nearly as well-structured as @dstufft 's.
"Getting started" was mostly correct for when I started out, and had a fair amount of tweaks and twiddles to get things working. Some examples: #10393 #10816
I found that working with a container-based stack was quite comfortable to me, and even went as far to explore how to make the entire stack work on a cloud-based development environment in Gitpod, to help alleviate some of the bootstrapping/building one would have to undertake before making a change. I made some decent progress, but the true measure of this kind of effort is seen when the GitHub Org signs up for an (Open Source) account for GitPod and then seeks out the utility of the Prebuild functionality, so you essentially get a unique workspace that has already built any base images for you, and your time-to-write-code is greatly reduced.
On the topic of Docker layer caching - the folks over at https://depot.dev/ have come up with something that works pretty fast, and are also interested in support OSS, so that might be something to pursue as well, if GitPod Prebuilding isn't the direction to follow. (I didn't spend a ton of time on GitHub Codespaces, as they felt too new at the time and underpowered compared to the 16 CPU, 64GB RAM that I get for free by default from GitPod - far outweighing my own laptop's power).
Providing a preset devcontainer.json file would likely also solve for 7. Poor integration with Editors - as the code referenced is no longer seen on the "host" rather it directs the editor to reach into the container that has it. For those of us that like PyCharm Professional, there's a similar approach.
Developing on GitPod (remote scenario), even using a local editor solves a lot of developing on Windows concerns. There's also no more Host-level artifacts that can leak in.
On the poly/monorepo side - I think this does have some negative tradeoffs that haven't been expressed fully, notably:
warehouse
has a CI/CD approach to deploy on merge to main
- with multiple libraries and merges to main, will this add more overhead and complexity to the CI/CD system?I haven't given the hermetic builds a ton of thought yet, but agree that using Docker image hashes is far more "pure". Adding version specifiers to Dockerfile apt
commands is another good idea, and will have to follow the upstream distro (buster?) versions available.
Anyhow, that's some of the brain items that came out of this so far.
The volume permissions with make reformat
issue just bit me, with #15590: I'm on a Linux host for once, so running make reformat
caused my editor to freak out and refuse to write to files :slightly_smiling_face:
(I assume this doesn't effect macOS hosts for development because the permissions are all YOLO'd through a VM.)
Something I've been thinking lately is the developer experience of working on PyPI (not just Warehouse, but Warehouse and all the related services and libraries).
Background / Current State
One of the big things that marred people's ability to contribute to legacy PyPI was that getting a local setup was extremely finicky, including things like having to comment out certain pieces of code (or getting access to a running S3 bucket), getting a lot of local stuff setup (specific Python versions, databases, etc).
Early on in Warehouse's history we identified this as a major problem and decided to solve it, and have largely settled on broad assumption of
make
,docker
,docker-compose
, and the standard set of unix coreutils (sort
,awk
, etc) as the things we can depend on existing on the developers computer, and shunting everything else to happen inside of a docker container [^1].We've also made the decision to split out certain parts of Warehouse's functionality into libraries that we distribute (readme-renderer, trove-classifiers, maybe pypitoken), additional services that we run alongside Warehouse (inspector, camo, linehaul, conveyor, pypi-docs-proxy, maybe forklift), and related supporting "misc" repositories (pypi-infra). Generally speaking, each of those repositories are operated as largely independent projects within their own repo (their own release cadences, issues, etc) [^2].
To manage dependencies, we've settled on using
pip-compile
to create a lock files that has been generated with valid hashes, of which we have multiple of them to break our dependencies down into broad categories (so that we don't have to install test dependencies into production for instance) that are each generated independently of each other [^3]. To manage dependency updates, we rely on Dependabot to create PRs for dependency bumps that we then run as a pull request.To manage any sort of admin tooling or similar that we may want to write, we generally have a few options available to us:
python -m warehouse
, and manually invoke it with direct access to our running kubernetes cluster.That largely covers the bulk of Warehouse itself, the other missing piece for Warehouse is our front end JavaScript and CSS. That is managed entirely separately using Gulp, Webpack, and a bunch of JS libraries. This has more or less been untouched and has been effectively unmaintained as changes to it have a tendency to break things often, which our front end test coverage is poor, so it's largely just sitting there with minimal changes made to the build infrastructure.
For managing CI, we're currently using GitHub Actions, which have been carefully constructed in such a way to directly run the underlying commands that, in local development, would normally be invoked inside of a docker container [^4].
The final piece of the puzzle is that deployments are managed by Cabotage, which orchestrates building a
Dockerfile
at the root of a repository, managing configuration for the deployed versions of an application, combining that into Kubernetes primitives to deploy the build images into Kubernetes using the processes defined in the repo'sProcfile
, and finally updates statuses in various locations (Github Deployments API, Slack, etc).Current Problems
1. Speed
One of the biggest problems with the current state is that iterating on PyPI is very slow anytime you have to interact with the build tooling. Something that should be relatively fast, like for instance running black against our code base, running black natively on my desktop [^5] takes just under 0.3 seconds, however running
make reformat
takes over 5 seconds assuming that I have the docker containers built and I havemake serve
running in another shell [^6].If I don't have
make serve
running in another shell it takes somewhere around 45 seconds to run and it leaves a number of docker containers running in the background after the command has finished.If someone doesn't already have the docker containers built (or if they've for some reason caused their cache to be invalid, say switching between branches with different dependencies) then you can add around 2m30s to build the local containers and possibly another couple of minutes to download images on my Gigabit internet connection.
Now to some extent some of the "bootstrapping" speed problems are unavoidable, docker is still one of the best ways to get a bunch of related services like PostgreSQL, Redis, etc onto a random contributor's computer without having a large and complicated set of steps to set things up, and caching of build steps alleviates some of these problems.
However, the way docker fundamentally works makes this caching less useful than it otherwise could be for speeding up repeat runs, the biggest problem being that docker caching is fundamentally a linear chain of mutations that take a single input and produce a single output, so anytime some part of the chain is invalidated, everything coming after it has to be regenerated, whether it actually needed to be or not.
Most well made build modern build tools (and even a lot of older tools.. like Make itself) support the idea of a dependency graph that takes multiple inputs (and some of them can produce multiple outputs, though it's common to only support one output), and can then be more intelligent about (re)building only the parts in the dependency graph that it actually needs [^7].
Taking all of the "bootstrapping" speed issues off the table for a minute, even in the very best case the nature of starting up a docker container for any command we run essentially means that nothing we run can return in <5s, which is just enough time to be frustrating, even when the underlying command itself is fast.
2. Mounted Volumes Issues
One of the ways we make developing Warehouse not intolerably slow is that once our application image has been built and is running, we run that container with the developer's checkout of Warehouse mounted overtop of the path that the build process typically installs Warehouse into, then we run Warehouse such that it will monitor those files for changes, and will automatically reload itself whenever those files changes, creating a reasonably fast feedback loop on changes that can be handled this way.
However, mounting a host volume in this way brings with it some issues.
The largest issue being that in the typical setup, the docker daemon typically runs as
root
but users typically do not so any files created within the docker containers are owned by root and the user doesn't have permission on them. The most obvious place people can run into this is with the generated static files, but it also effects things like auto generated migration files, any files thatmake reformat
ormake translations
needs to format or generate, or the generated docs produced bymake docs
.Other issues stem from things like the reloader not being well protected from invalid code causing it to fully exit, so if you save a file with syntax errors, the web container will crash completely [^8], causing you to need to Ctrl+C the running
make serve
and then restart it, which depending on what you've changed may or may not trigger a whole rebuild of the docker container causing another multi minute delay [^9].Another issue that stems from this, is that due to the way node works, if the developer has a
node_modules
directory for their host system in their repository and that has been mounted into the docker container, then node/npm will blindly use whatever is in there, even if it contains binaries for a completely different system. To work around this, we don't actually mount the entire repository into the docker container, but instead mount a bunch of sub paths within the repository.This can cause confusion for people if they've changed anything [^10] in a location that hasn't been host mounted as the version inside the container might not match the version like settings inside a top level file or a new sub path that simply hasn't been added yet.
3. Dependency Management is Error Prone and Manual
The way
pip-compile
works (and really, all or most of Python packaging) is that dependency resolution is a "one shot" process, and invokingpip
orpip-compile
multiple times does not guarantee that the end state will be valid.For the way that we use
pip-compile
, with multiplerequirements/*.in
files that each get compiled individually into separaterequirements/*.txt
files, is that overlapping dependency constraints between different files are not taken into account so you may end up with the same dependency being pinned to multiple different versions inside of differentrequirements/*.txt
files, which fundamentally cannot work, and currently we have to manually resolve these cases when they come up.In addition, while
pip
supports taking multiple requirements file as an input, it doesn't support having some of the requirements pinned using hashes and some not, so attempting to install the requirements files that are not being generated usingpip-compile
has to happen in a separate step, which as mentioned above, means the end state may not be a valid set of packages. We will "fail fast" in this case because after we've gotten done installing everything, we runpip check
which will iterate through all of the dependencies and verify that all of the dependency constraints are valid [^11], but again actually resolving things to prevent this from happening is a manual process.To ensure that our generated
requirements/*.txt
files have been generated after a dependency has been added or removed we have amake deps
job which essentially just runspip-compile
on all of therequirements/*.in
files to a temporary location then compares that to the current version to look for changes. However, because the output can change if someone uploads or deletes a file [^12] or publishes a new release [^13] this job parses therequirements.txt
to extract the names and makes sure there are no extra or missing names.Unfortunately, this check is particularly bad for a few reasons:
pip-compile
to generate a new lock file it will pull in newly released files or versions since the last time versions were bumped, it attempts to control against that, but if a new version of one of our dependencies adds or removes a whole new transitive dependency then that causes this check to fail due to no changes in the Warehouse repository, which means this check starts failing on every pull request or deploy until someone manually fixes it.make deps
job strips the==VERSION
when parsing therequirements.txt
file and only compares names. However that means that the value of this job is limited because it's not actually making sure that ourrequirements.txt
files are producing whatpip-compile
things they should, even on pull requests that actually modify our dependencies in some way.This means that this check ends up creating "emergency" dependency management tasks to unbreak CI at effectively random intervals but the mitigations to reduce that end up drastically reducing the value of that check, allowing invalid things to pass (and thus requiring manual intervention by humans to fix it) [^14].
Likewise, Dependabot is also falling short here for us.
Dependabot itself does not understand
pip-compile
at all, it's just treating each individualrequirements/*.txt
file as an independent artifact and blindly bumping versions regardless of what is inside of ourrequirements/*.in
files, or even I believe, regardless of what the dependency metadata within the packages themselves say. On top of that, the same problem withpip-compile
only resolving one set of files at a one applies here as well, it will gladly bump versions in different files to different versions, even though for our purposes they ultimately need to exist in the same installed environment and thus need to be the same [^15].Dependabot also does not support grouping multiple dependency updates together, which means that every single dependency update has to be merged individually or someone has to manually combine all of the pull requests (which maybe can be done automatically using merge, but not always if there are merge conflicts) which makes attempting to keep up on all of the dependabot PRs take a large amount of time [^16]. This can be particularly problematic for dependencies with tight integration with each other like the various boto tools that may not even be valid to update one at a time, but even if it is every update requires updating multiple dependencies.
4. Multiple Repositories Increase Overhead and Cause Bit-rot
In general I'm a fan of multiple repositories, treating individual projects as their own unit as I think that that ends up providing an overall better experience for everyone and requires less amount of "build infrastructure/tooling maintenance" for each of those individual repositories as most OSS tooling assumes a 1:1 mapping of repository to project and the alternative, mono repos, generally become unwieldy and require more effort to keep them performant.
However, that being said the poly repo strategy has it's own problems, particularly when viewed under the lense of "the primary purpose of this collection of repositories is to be combined into a singular thing", like is the case with Warehouse + related repositories coming together to form "PyPI".
Some of the forms of overhead include:
readme-renderer
usestox
to manage enviornments/tasks whiletrove-classifiers
uses a basic hand madeMakefile
.The other big problem comes in the form of bit rot, we have a service (pypi-docs-proxy) that we have deployed (I think) that haven't had any commits to them since 2017, which I can't even remember where or how they're deployed, what version of Python they're deployed against (is it still even a supported version?), what version's it's using, etc. We have another service (conveyor) that hasn't had a commit since 2021, which appears to be deployed using Cabotage (yay, something matching Warehouse).
With all of these lesser deployed things, the projects themselves don't require a ton of changes, but things that aren't regularly built and maintained tend to bit rot to where attempting to pick them up again in the future ends up first having to re-learn whatever the dev practice of that repository was and almost certainly figuring out what things have broken in the mean time and need to be fixed before you can even get into the position of adding new changes or updating dependencies or whatever.
5. Needlessly re-running tests
In development you're hopefully regularly running tests, however doing so by default will run the entire test suite, which takes several minutes [^18], and the bulk of that time is typically spent running tests where whatever thing you've changed couldn't possibly affect the outcome of the results.
You can filter this down by passing arguments into py.test to select only a subset of tests, however that is a particularly manual process and it requires either just intuitively knowing all of the tests that may be affected by your change OR accepting that you may be breaking random other tests so at some point you'll need to run the full test suite again [^19].
Some more modern tooling that is able to actually model the dependency graph of your code as a whole across the entire project can actually figure this out for you, and only run the subset of tests that depend on the things that you've changed, automatically skipping tests that don't depend on changed code, and thus can't possibly be affected [^20].
6. Host system and outside state "leaks" into the build artifacts
We've done a pretty good job of isolating Warehouse from the host system it's running on and from random state changes coming from the outside world.
However we haven't completely eliminated them, for instance we are fetching docker images from docker hub by tag instead of by hash, which means that as the docker image gets updated on docker hub the output of the build system can change without any related change within the Warehouse repository.
Even within our own repository, our
Dockerfile
does things like blindlyapt-get install
things which will affect the outcome of building the project and then within our CI we're running completely independently of docker and just running directly on the Github Action "host", pulling in any random state that might come from that.That might all sound somewhat pedantic, but it actually is an important consideration, which generally falls under the idea of "hermetic builds". A hermetic build being a build that has little to no dependency on the host itself, and instead pulls in everything itself either by including it in the repository (e.g. what Google does) or by using some form of an external reference that is pinned to an exact, unchanging version (e.g. what most people do) [^21].
When you have a hermetic build, that enables some pretty compelling features like:
It also means that you're insulated from random changes in the world affecting your build such that a broken package released to
apt
or whatever cannot possibly break you without a corresponding change to your repository.7. Poor integration with Editors
Right now, attempting to open Warehouse in a modern editor that has integration with things like mypy produces a big blob of useless warnings that look something like this:
All of those warnings roughly boil down to:
Import "suchandsuch" could not be resolved.
, which comes from the fact that we never actually install or expose the dependencies that we're working with on the host system in a way that an editor can introspect, everything is hidden inside of a docker container.Unfortunately there's not really a good way around the fact that the only good way to handle this is to just install everything into an environment on the host system as well and point your editor to that, but doing that in our current setup means that we again start to depend on things like "having the right version of Python installed" on the host system, even if it's only for a little development shim to make editors happy.
To try and accommodate that, you can see that we have a
.python-version
file at the root of the repository to let people sort of set it up themselves and hopefully let some editors decide what version of Python to use. However, that file itself depends on having something like pyenv or asdf to interpret it (outside of manually setting things up) which is yet more tooling that developers have to install.8. Multiple "interfaces" for developers
In theory this one doesn't actually exist, everything we expect developers to have to invoke we wrap inside of a
make
target which then means thatmake <foo>
becomes our build infra/tooling interface, which is maybe not the best or most feature filled interface, but it is a consistent interface [^24].In reality though, a number of the problems with the way that our tooling currently works leads people to want to work around having to invoke
make <something>
and instead call the underlying thing directly, something that we can see even within the project itself by the fact that all of our CI things don't invokemake <something>
, but instead call some underlying command.9. Lack of support for "Internal" Code
This partially goes to the admin interface question, but also just in general.
Right now there's not really a good way to have code in Warehouse that isn't part of Warehouse, which means that if we want any sort of internal utility that we have to either incorporate it as part of Warehouse or we need to spin it off into it's own repository, which often times doesn't make much sense if it's closely tied to Warehouse and comes along with all of the aforementioned problems with polyrepos.
This makes things like one off commands a lot harder to handle, if we bake them into Warehouse itself they're forced to comply with things like our 100% test coverage and possibly other things in the future (MyPy coverage?) unless we carefully carve out a section of Warehouse itself, with these automated testing providing little to no long term value (and possibly even negative value).
Instead what often happens is scripts like this get developed externally from the code repo completely and just get manually copied around until they're eventually ran manually (typically on a jumpbox inside of our VPC). A great example of this kind of utility would be code that lets an admin purge an URL from our CDN, where writing an web interface for it is away more effort, that lets us restore something close to the simplicity we had with
curl -XPURGE ...
prior to when we added auth to purging.This also can extend into code that needs to be shared between multiple services, but that we don't want to make generally available as a published library (though in that case we would want to keep the test coverage requirements) or even utilities used as part of the build tooling (see everything inside of
bin/*
for instance).10. Lacking(ish) support for developing on Windows
We've made the decision that the things that we depend on from the developer machine is effectively:
make
docker
docker-compose
rm
,mv
,awk
, etc).This effectively locks development into something resembling a Unix like environment, though not entirely because all of those tools do have versions that you can get for Windows though it's unclear if things would actually work using them or not. Windows users do have another option in using WSL, which is essentially a way to run Linux inside of your Windows install.
I've verified that WSL2 does in fact generally work fine for developing Warehouse on Windows, however it's kind of like having a linux VM in that you're effectively in a Linux environment, so unless you're familiar with it and have it setup the experience is likely going to be kind of miserable.
11. Probably More Things
Who knows! I'd love to hear if there's problems with developing / maintaining Warehouse that fall under the build tooling / infrastructure that people feel like they've just been begrudgingly living with that aren't covered by the above.
Ideas for Fixing
We probably can't fix absolutely everything, but hey maybe we can make some things better! I'll have a follow up post later with some ideas for how we could make some things better here, but i wanted to get this up because it's already very long and I wanted to see if people had their own things to add.
[^1]: Originally we also assumed Python was installed on the host system, and we could install local tools into a virtual environment managed by our
Makefile
, things likeblack
etc, but have since moved those so that they run inside of a docker container as well. [^2]: This isn't 100% true, some of them don't really get meaningful activity and sometimes issues get opened up on Warehouse and we just fix them in the correct repo without worrying about it, but in the general abstract case, this is true. [^3]: This also isn't 100% true, we have some dependencies, particularly dev dependencies, where we only have arequirements/*.txt
file with loosely constrained dependencies with no hashes. [^4]: My memory of this, is that this was largely done for a combination of speed (not running things inside of docker is faster than spinning up a docker container and running things inside of a docker container) and docker in docker problems back when our CI was still running on Travis CI. [^5]: Where "natively" is actually running black installed via pipx running inside of WSL2 on my Windows machines. [^6]: For reasons that I assume are oversight, even simple commands likemake reformat
spins up PostgreSQL, Redis, and ElasticSearch. [^7]: Of course, the ability to do this is only as good as the information that the build tool has, the more fine grained the dependency graph is, the smaller the "blast radius" of a particular change is. [^8]: To make matters worse, often times the output indicating that the web container crashed quickly scrolls off screen as some of the containers are more robust against crashing from Syntax errors like this, so often times you don't really see it until you try to load an URL from the web app and get an error. [^9]: In theory this means that the docker container needed to be rebuilt and the one you had running previously was no longer valid, but in practice cache invalidation for these containers has a lot of noise, and often is a false positive. [^10]: Explicitly or implicitly through changing branches or something. [^11]: I could be wrong, but I think thatpip
doesn't record what extras are supposed to be present, so I believe thatpip check
will ignore any constraints that are pulled in via an extra that we've selected. [^12]:pip-compile
lists all the files for the pinned version, even ones that we'll never use because it doesn't know that we won't use them, which means that if the list of file changes because a new wheel was uploaded, or an old one was removed or something, a simple diff would fail. [^13]: We're effectively only checking the names of the pins, not the versions they're pinned to. [^14]: Invalid things should somewhat be checked by thepip check
that we manually run... however that's only going to check against the dependency metadata that those packages provide, it won't check against any constraints that we've introduced inside of ourrequirements/*.in
files. [^15]: There are other options besides Dependabot like Renovate that actually do understandpip-compile
, but they still have the same fundamental problem that we have withpip-compile
in that they'll do the resolution for each pair of.in -> .txt
files independently. [^16]: We make this worse on ourselves by requiring that branches be up to date with theHEAD
ofmain
before they can be merged, which we do because we auto deploy frommain
and without a merge queue it's the only way to actually have tests run against what the actual merged state of that PR would be. This however means merging say 10 dependabot PRs involves merging 1, updating the next one withmain
, waiting for tests to pass, merge, then repeat, which can take an hour or longer of just toil work. [^17]: This can often times come across as pedantic or hostile, particularly when the same set of people are managing all of the involved projects, because it creates this appearance that you've just created make work for that end user because they happened to approach you "wrong". [^18]: The fact our test suite takes several minutes is its own problem, but this is about the build tooling and infrastructure! [^19]: Never mind the fact that selecting tests like this breaks anything that looks at the status code of the results ofmake tests
because we require 100% test coverage, which can't exist when you're only running a subset of tests without any knowledge of what lines of code that you expect those tests to cover. [^20]: Unfortunately most of these tools don't handle dependency cycles at all, which the Warehouse code base has a non trivial amount of them, so to actually gain benefit here in a way that doesn't treat like the entirety of Warehouse as one dependency would require refactoring the code itself, which if we go with one of those tools we should probably do, but that would likely be a longer term effort. [^21]: When dealing with hermetic builds, you can pretty easily get yourself into a rabbit hole where you're eventually down all the way to building your own C compiler from source because different versions of a C compiler can produce different outcomes. In practice most people try to draw a line somewhere between "building a C compiler from source" and "YOLO fetch everything fromHEAD
on every build" using how likely it is that different versions of that "thing" is likely to actually meaningfully affect the outcome of your build. [^22]: This concept of "reproducible builds" is different from the common one talked about in packaging which is that the build process will produce a byte for byte incidental output-- It's great if it does that, but what this actually cares about is that the semantics of the output are equivalent, e.g. if it produces two wheels that install identically but the wheel file itself has some form of randomness in it that doesn't affect anything, then that's fine. [^23]: Whether this is a good or a bad thing depends on how fast downloading those cached objects are versus rebuilding the project from scratch, so this particular thing might not be super useful for developers, but it could be very useful for CI. [^24]: Ideally this extends to multiple different languages too! Python is great, but one of it's major features is that you can glue into C or Rust or whatever code, but we currently can't do that without spinning up a whole new project OR adding another build system OR fundamentally changing how we build Warehouse. This isn't a hypothetical problem either, we currently need Python AND JavaScript to develop Warehouse, and we end up having two entirely separate build systems for Python and JavaScript and just paper over it withmake
anddocker
.