We need environments to be shareable, reproducible and upgradeable for at least a 2 month window (ideally 6-12 months). This is deceptively non-trivial.
Problems/ what we want
We want participants to be able to install and load up a working environment reliably quickly. Ideally, we would do this with a lock file that bypasses dependency resolution.
Environments need to be incrementally upgradable. When we add a package, or when a package upgrades, we want to be able to update the environment easily, avoiding the headache of a full environment resolve.
Building environments from lock files has to respect platform+architecture differences, so you need a lock file for each architecture.
Lock files for conda don't really exist that include the pip section properly.
Pip installation needs to be as first class as conda. We can't have packages installed by both
Upgrades and resolving can be a giant headache (for all the reasons we've been dealing with the past couple of weeks). These issues and more are alluded to here: http://iscinumpy.dev/post/bound-version-constraints/
To avoid this headache, we'd like to be able to test the full solve on clean environments on multiple platforms to be able to catch issues before we break the environment build. This way the changes are small and more easily debugged, rather than a giant snotball of changes that is hard to figure out.
We need to be able to hand pin versions of packages to avoid bugs when they come up but keep track of when we can unpin again. It would be great to easily automate the testing of the upnpin without breaking the environment build.
Python environments become huge and can't be resolved at some point. Move to 1 env per repo and then more than 1 env per repo.
What's cached locally affects the build. Environment specification should always be CLEAN-ROOM.
We want to be platform agnostic, so a Docker container isn't the answer for this.
Related Problems, but not mainline at the moment
If I'm maintaining a library and associated notebooks as documentation, I'd like to be able to provide an environment (and even datasets) that work to run the notebooks so I don't have to debug individual environment issues for users.
If I'm maintaining a project, I'd like to know when my dependencies are shifting in a way that is incompatible with my project. I'd like to run CI on --dev so I can know what's coming down the pipe and if there are any breaking changes, and anything that breaks my tests. The part that's tricky is when it breaks my environment before it breaks my test. It would be nice to have a "helping hand" on that step.
What have we tried and things we've looked at
make + conda env --export
conda lock
conda lock + Poetry
mamba solver vs. conda solver
What don't we know
Does anyone else care? How do people try to work around this already? This is a maintainer problem, not a user problem.
For web-based applications, we've heard of pip lock files, and git actions that resolve and propose security patches as they become available
What's the easiest/hackiest way to hand build an MVP that addresses the core issues? We need something that works for us for the next 2 months. We're willing to try something that is messy to do, but works.
Running Comments
Lockfiles, Environment Generation, and Windows
In working with our windows users to determine the cause of their windows environment creation woes, it turns out it’s not windows at fault here. There were issues around the version pin for igraph. Removing that allows the environment to be successfully created, but still takes a REALLY long time to generate.
There’s a hack. You can create the environment without igraph (and the other two troublesome packages), and then add the three offending packages with ‘make update_environment’, and it goes much more quickly. Presumably, this is because it cuts down (or changes the order) of the dependency resolution search. Still, there’s no easy way to make this work in CI, or for end-users without manual intervention, so we need another way.
In the end, Amy and I concluded that we’re should generate and check in lockfiles for the major platforms we are using, and ensure conda environment generation uses those lockfiles (vs. environment.yml) if present. This got us to thinking about what CI for Environments would look like. We wrote up a strawman, and we have a plan to implement it with azure pipelines.
What I did was mostly in response to blockers/what didn’t go well that came up last week…so I’ll leave it for the next section. From what I said I’d do last week, I sort of fixed teh environment.yml, and have a potential fix for CI. I have a whiteboard sketch of what DONE looks like for the preparation, but still need to transcribe it to the wiki.
We keep butting up against an issue with environment creation and maintenance that is annoying at best, and a total blocker for some participants at worst. I was wrestling with this already, when our windows users came up against the exact same thing. It all boils down to the fact that environment creation is fragile (especially if you want to maintain the flexibility to upgrade):
order matters, even though it’s not something that you can predict or really control. for example, if I have an environment.yml file, create an environment with it, and incrementally add packages to it via make update_environment and then try to recreate the environment from scratch via the latest environment.yml file, it might never figure out how to resolve (even though obviously a solution exists).
by doing environment creation from environment.yml (what you want) instead of doing environment creation from a lock file (what you needed at the point in time it was locked to run what you want), the creation process is unstable. We’ve done this to work around the “no round trip” issue with conda, but the massive number of underlying packages and version updates that happen to an environment from week-to-week let alone month-to-month makes it like building a house on sand. Not what you want for reproducibility and stability between participants, that’s for sure.
As @hackalog mentioned in his post, we’ve been working through the nuances of this, and what CI/CD for environment creation and maintenance might look like. Especially since environment creation is probably the biggest blocker to our <15minutes from fork+clone to loading up a notebook that runs successfully.
Note: on the above issues, the conda solver was either buggy or slow or mamba was ignoring the strict channel order or both, because we thought we solved the slow/crashing build problem by switching to mamba.
UPDATE: We didn't. Mamba wasn't building an environment correctly. In particular, mamba env does not appear trustworthy. Sigh
In brief
We need environments to be shareable, reproducible and upgradeable for at least a 2 month window (ideally 6-12 months). This is deceptively non-trivial.
Problems/ what we want
Related Problems, but not mainline at the moment
What have we tried and things we've looked at
What don't we know
Running Comments
Lockfiles, Environment Generation, and Windows
In working with our windows users to determine the cause of their windows environment creation woes, it turns out it’s not windows at fault here. There were issues around the version pin for igraph. Removing that allows the environment to be successfully created, but still takes a REALLY long time to generate.
There’s a hack. You can create the environment without igraph (and the other two troublesome packages), and then add the three offending packages with ‘make update_environment’, and it goes much more quickly. Presumably, this is because it cuts down (or changes the order) of the dependency resolution search. Still, there’s no easy way to make this work in CI, or for end-users without manual intervention, so we need another way.
In the end, Amy and I concluded that we’re should generate and check in lockfiles for the major platforms we are using, and ensure conda environment generation uses those lockfiles (vs. environment.yml) if present. This got us to thinking about what CI for Environments would look like. We wrote up a strawman, and we have a plan to implement it with azure pipelines.
What I did was mostly in response to blockers/what didn’t go well that came up last week…so I’ll leave it for the next section. From what I said I’d do last week, I sort of fixed teh environment.yml, and have a potential fix for CI. I have a whiteboard sketch of what DONE looks like for the preparation, but still need to transcribe it to the wiki.
We keep butting up against an issue with environment creation and maintenance that is annoying at best, and a total blocker for some participants at worst. I was wrestling with this already, when our windows users came up against the exact same thing. It all boils down to the fact that environment creation is fragile (especially if you want to maintain the flexibility to upgrade):
As @hackalog mentioned in his post, we’ve been working through the nuances of this, and what CI/CD for environment creation and maintenance might look like. Especially since environment creation is probably the biggest blocker to our <15minutes from fork+clone to loading up a notebook that runs successfully.
Note: on the above issues, the conda solver was either buggy or slow or mamba was ignoring the strict channel order or both, because we thought we solved the slow/crashing build problem by switching to mamba. UPDATE: We didn't. Mamba wasn't building an environment correctly. In particular, mamba env does not appear trustworthy. Sigh
References