uoft-networking / tools

A repository of tools used by networking teams across the University of Toronto
MIT License
4 stars 0 forks source link

[TODO] Find better way to manage pip dependencies in our monorepo #29

Open alextremblay opened 3 weeks ago

alextremblay commented 3 weeks ago

The tools repository is a monorepo containing a lot of uoft projects. All of which are pip-installable packages, some of which are python libraries meant to be imported, some of which are python CLIs meant to be installed by end users and executed, many of which are both, and many of which form a dependency graph (ie nautobot depends on aruba+ssh+bluecat, scripts depends on bluecat+ssh+librenms, all of them depend on core)

For the monorepo and our python packaging / dependency management tooling, we have the following goals:

  1. We should be able to create custom forks of external projects we depend on (to fix bugs or add features) and be able to reliably reference those forks until the changes we make get upstreamed. We should be able to track these custom forks in separate git repos (forks of the original) and add these forked repos as submodules of our monorepo

  2. any developer wanting to contribute should be able to pull down the repo and run a command to set up a venv and install all of our packages as well as all their dependencies, pinned to specific versions (through use of a lock file)

    • all packages in the monorepo should automatically be installed in editable mode in this context
    • all custom fork submodules should be initialized automatically and installed in editable mode in this context
  3. any end user who wants to install any of our projects should be able to pull down the repo and pipx install projects/<name> and have it just work

    • for any projects that depend on other projects in the monorepo, pip/pipx should be instructed to install those dependencies by relative path (ie if user installs "aruba" project, and aruba depends on uoft_core, pip/pipx should install uoft_core from path "../core" relative to the aruba project folder, instead of installing uoft_core from PYPI, for example
    • for any projects that depend on custom forks, pip/pipx should either (haven't yet decided which is better:
      1. automatically initialize submodules of the monorepo and install those dependencies by relative path
      2. install those dependencies from git repo URLs (ex: if we have a custom fork of nautobot at https://github.com/utsc-networking/nautobot, pip/pipx should, while installing any package that depends on nautobot, interpret that dependency as nautobot @ git+https://github.com/utsc-networking/nautobot)
  4. users should be able to install any of our projects from any branch of our git repo by calling pipx install git+https://uoft-networking/tools@branch_name#subdirectory=projects/<name> and have it behave the same way as #3

  5. we should be able to build python wheels suitable for publishing to pypi, which reference dependencies by bare name instead of by relative path, and which reference custom-fork dependencies by git URL (as described in #3.2 above)

  6. optional bonus I would love to be able to restructure our monorepo so that all code for all projects lives in a single source tree and gets automatically broken up into PEP420-style namespaced packages (ie instead of having a package called uoft_core whose code lives in projects/core/uoft_core and a package called uoft_aruba whose code lives in projects/aruba/uoft_aruba, I'd love to have a package called uoft.core whose code lives in src/uoft/core and a project called uoft.aruba whose code lives in src/uoft/aruba

each of these is easy to accomplish, but accomplishing ALL of them together is extremely hard.

To accomplish #1 and #2, we use rye, which automatically installs all projects in projects/* and custom-forks/* in editable mode. The downside to this is that all developers on our monorepo must install every project and every dependency in their venv, even if they only want to work on one small project

To accomplish #3 and #4, we've structured all of our projects as PEP517-compliant python packages, with project metadata defined in a pyproject.toml that lives alongside each project. each of these packages uses a PEP517 build backend called hatchling to tell pip/pipx how to install the package, and each project has a pyproject.py hooks file that gets called by hatchling, allowing us to automatically convert bare dependencies on monorepo projects into relative reference dependencies. for example, the aruba project's pyproject.toml declares that uoft_aruba depends on uoft_core. When pip/pipx installs uoft_aruba, projects/aruba/pyproject.py is triggered, and it automatically rewrites that uoft_core dependency into uoft_core @ ../core

To accomplish #5, I've tried to add logic to the pyproject.py hook files to not rewrite dependencies when building wheels, but it does not work and is difficult to debug, so more work is needed there

6 WOULD be possible / accomplishable, but not in a way that's compatible with #1, #2, #3, or #4. The only way i can think of to accomplish that in a compatible way would be to replace hatchling, pyproject.py files, and per-project pyproject.toml files with a custom in-repo PEP517 build backend. the idea is complicated and still not fully-formed in my mind, but it's there

alextremblay commented 3 weeks ago

You may ask yourself: how do other projects accomplish these things? answer: they don't. python monorepos are quite rare simply because it's very difficult to accomplish even half of all these requirements in a single repo, let alone all of them.

As far as I am aware, we are breaking new ground here. I'm not aware of any existing python monorepo that accomplishes as many of these goals as we do. we may very well be on the cutting edge here, for better or for worse 🫠

alextremblay commented 3 weeks ago

I think we can get a better outcomre for #1-4 by switching from rye to uv

we need to make the switch anyway, as rye is being deprecated and rye's developer is recommending users move to uv anyway. uv now, as of the latest release, has support for workspaces, which fits our monorepo multi-package requirements quite well

also, uv has a dependency override mechanism which which would be a huge help to us. given the massive size of our monorepo's dependency graph, it's not uncommon for two of our packages to contain transient dependencies with conflicting version constraints. when that happens, rye lock completely crashes, and we're forced to deep dive and figure out how to untangle the mess, including sometimes forking a sub-sub-sub-dependency just to update its version constraints on the transient dependency which caused the problem. It's a mess, and uv's dependency override mechanism may be the solution we've been looking for