pyiron / actions

A centralized location for our GitHub actions
BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

[patch] Debug py version #124

Closed liamhuber closed 4 months ago

liamhuber commented 4 months ago

Per #123, non-default (3.12) python version unit tests are not being run. Let's fix this.

liamhuber commented 4 months ago

Ok, so conda-incubator/setup-miniconda@v3 has no problem giving me the requested python version, and when I mamba update with an env file containing python_abi I get the right version of that too, and ultimately python -c "import sys; print(sys.version)" is coming out exactly as expected.

My guess is that one of the dependencies is not available at 3.9 (or not 3.12) and python gets updated alongside.

liamhuber commented 4 months ago

Manually adding the unittest and coveralls-codacy dependencies still gives the correct python and python_abi versions. Time to kick it up a notch at try invoking pyiron/actions/cached-miniforge instead of the incubator directly.

liamhuber commented 4 months ago

Ok, the problem occurs when we mamba env update while the env file explicitly has - python >=3.9,<3.13; if the python version not mentioned in the env file, it seems that we avoid upgrading the requested python version. Since we want to specify the python bounds in the env file, the remaining solution is to pass the env file directly to the conda incubator, which is working fine even with the python versions present.

What remains is to update cached-mamba to get the caching functionality playing nicely with the env being created sans update.

liamhuber commented 4 months ago

The cache needs to know the path for the conda env it's looking for. When I run conda-incubator/setup-miniconda, this is available under env.CONDA. Otherwise I can find a path under $CONDA (/usr/share/miniconda on ubuntu-latest) but I need to verify if it's the same as I get post-incubator.

liamhuber commented 4 months ago

Ok, so here we hit the first and only real problem; $CONDA changes when we run the incubator, so it's not available a-priori. After running the incubator, both $CONDA and env.CONDA point to /home/runner/miniconda3. This is probably because I insist on using Mambaforge. Let's switch off all the flags and see if it finds /usr/share/miniconda instead.

liamhuber commented 4 months ago

It does use /usr/share/miniconda when no flags are specified (although the actual conda executable from which conda changes a bit).

The minimal change would be if there were some way to know what the conda env path would be a-priori, then simply cache/restore and (if the restore hit failed, run the incubator then) cache/save. Of course, I could get brute-force access by running the incubator (at most) twice -- once to get the path and check the cache, and (possibly) a second time to actually build an environment. The trouble there is that the incubator takes ~25s even with no env file, and I don't really want to burn that much time.

The incubator docs themselves for caching envs actually suggest to do exactly what we currently do, so that's no help.

pyiron_base uses Jan's suggestion of passing the env file directly, but doesn't exploit any caching, so that's not super helpful either.

The cache key doesn't depend on this, so my gut reaction is to break the cache into two pieces: first storing the env path at some fixed, known location, and then second storing the actual env. We can try restoring the location cache and only if the cache gets hit do we proceed to restore the actual cached env using the result of location cache for the path cache arg. This feels ugly though, so let's have a think if there's a better solution before pushing on.

liamhuber commented 4 months ago

I couldn't think of anything more elegant, so let's go with the double-cache attack.

liamhuber commented 4 months ago

Aha, it's not finding the cached conda executable because /home/runner/miniconda3/condabin doesn't exist.

And of course it doesn't, because I only cache /home/runner/miniconda3/envs. Let's try additionally caching the executable. Even if this works, I should be sensitive to the cache size, which is 55MB with just the env. Otherwise I guess I can make a raw environment, then reload the env, then activate the env instead.

liamhuber commented 4 months ago

I should be able to activate a conda env from anywhere, so let's try that with the built-in miniforge installation and see if it works. It would be simpler than caching/reinstalling mambaforge.

liamhuber commented 4 months ago

Well, that's infuriating. I got the built-in conda executable to activate the setup-miniconda cached environment, and the config exposing that environment is even persistent to the outer job, but I don't currently have a way of making that env the default loaded env outside of the initial step where we activate it.

liamhuber commented 4 months ago

GitHub doesn't do anything to manage conda env persistence between steps, much less jobs. The modifications to the conda config seem persistent enough for my purposes, but then one still needs to jump through hoops to activate the correct env in each step. Since the shell profile is not sourced between each step, there is no easy way to automate this (one could pass the buck to the shell profile, but then the user needs to remember to source that each step!)

As far as I understand, conda-incubator/setup-miniconda gets around this by really modifying PATH and creating new conda executables, etc., which get referenced and do maintain env persistence.

Sadly, I think the most practical thing to do is give up on the speed of leveraging the built-in conda binary and just always run conda-incubator/setup-miniconda -- either with an environment file specified (if not caching), or a clean install followed by unpacking the cached env and activating it (when available and requested).

liamhuber commented 4 months ago

Uncaching, installing a blank miniconda, and then activating the uncached env works fine, but the activated env is not persistent across steps. Uncaching, and installing a miniconda that explicitly activates the existing env as an arg is persistent but the env is somehow completely empty -- not a single package!

Unrelated but tested in parallel, stopping a clean miniconda from activating the base env brings the time down from 27s to 20s (N=1 test), so not mind blowing but worth doing if the opportunity presents itself.

liamhuber commented 4 months ago

Super, creating an empty env then unpacking the cache into the same location worked perfectly! I get the right packages, it's all persistent to outside the called action, and it's faster than setting up the env from scratch (for this particular env (pyiron_snippets test env) it's 14s having a blank env and uncaching vs ~35s solving and installing from the env file).

With a better grasp of how this works, I think I can even streamline the process and eliminate the two-stage caching.

liamhuber commented 4 months ago

Setting up the cache took 45s, exploiting it takes 35s, and running without any caching at all takes 30s. Again, all N=1 tests with the super simple env from pyiron_snippets.

Changing the env to be a bit more complex (pyiron_base and dask with no version pins), these become 1m10s, 27s, and 1m8s.

This is hardly an exhaustive or scientific test, but we reassuringly see that creating complete+caching takes longer than creating, and that creating empty+uncaching is faster than creating complete+caching. Whether creating complete is faster or slower than creating empty+uncaching depends on the complexity and size of the environment. Since the losses with caching are small for simple environments but the gains are significant for more complex environments, my qualitative conclusion is that caching should be the default behaviour.

liamhuber commented 4 months ago

I want to put this branch through its paces exploiting it in a real test environment (especially across all the host OSs), then if that goes well I'll come back and clean this up and release it.

liamhuber commented 4 months ago

Taking the new cached-miniforge for a spin over on pyiron_snippets (a) the env name was not appearing in the cache label -- an easy fix -- and (b) which conda fails on windows... I looked into that here and the conda commands (conda list...) still work as expected, it's just which that borks. So let's simply ditch the which conda step of the readout!

liamhuber commented 4 months ago

Looks OK on snippets, let's clean it up.