holoviz-dev / pyctdev

Python packaging Common Tasks for Developers
BSD 3-Clause "New" or "Revised" License
10 stars 2 forks source link

Improving installing packages with conda #97

Open maximlt opened 1 year ago

maximlt commented 1 year ago

I'm opening this issue to discuss about how we could improve the installation of packages with conda, with the goals of making that:

This is motivated by the recent work done by @philippjfr to improve the speed of the test workflows (starting from Panel) and by various difficulties experienced with using pyctdev for almost a year now.

How it works

Installing packages with pyctdev requires first to create an environment. This is usually done with installing first pyctdev and then running:

doit env_create --name envname --python=3.x -c channel1 -c channel2

which:

  1. Creates a new environment with the name, python version and channels provided
  2. Install pyctdev in that new environment:
    • If pyctdev (the one installed originally and running these steps) is in a pre-release version, install from the pyviz/label/dev channel
    • If the env var PYCTDEV_SELF_CHANNEL is provided, install from the channel provided as value of the env var
    • If none of the above is true, install from the pyviz channel

Note as this may be important that in step 2, the conda install step does not include all the channels listed in the doit env_create call.

Then the environment is activated, there's no pyctdev command for that.

The main installation can now take place, this is done by running:

doit develop_install -c channel1 -c channel2 -o options1 -o options2 -o options3

which:

  1. Finds the list of build dependencies and installs them with conda (respecting the channels provided with -c-)
  2. Finds and concatenates the options dependencies, and installs them with conda (respecting the channels provided with -c-)
  3. Installs the package in editable mode with python -m pip install --no-deps --no-build-isolation -e . (--no-deps as all the dependencies should already be there, --no-build-isolation to avoid creating a virtual environment, all the build dependencies are already installed anyway)

--conda-mode=mamba can be set to use mamba instead of conda in steps 1 and 2.

Making it faster

There are I believe two main avenues to make this faster.

The first one would be to use a faster solver, by default, either mamba or the libmamba solver. The --conda-mode option already offers the possibility to run the slowest install steps with mamba. Ideally though we wouldn't have to use mamba, and we would rely on the libmamba solver implemented in conda, which hopefully should one day becomesthe default one, or at least available not under an experimental flag.

The second one would consist in reducing the number of conda install steps. There are currently 4 conda install steps, the first one being when the environment is created, the three other ones to install pyctdev, the build dependencies and then all the other required dependencies. Installing multiple times in a conda environment is known to lead to long solving times:

We could go even further and have a single command to install all the dependencies, adding to doit env_create some of the features of doit develop_install, which would then be called as such: doit env_create --name my-env --python=3.x -c channel1 -c channel2 -o options1 -o options2 -o options3. As this might end up in installing a version of pyctdev that is not the latest, another command line parameter could be added to be able to add a version constraint, e.g. --pyctdev-install=">=1.1".

Making it more flexible

Some packages that are needed to run the test suite or the docs build are not available on PyPi. Because of that they are not listed anywhere in the setup.py file, instead, they are installed directly in the Github workflows files with conda install. pyctdev should offer a way to install these packages without having to resort to use conda directly.

Some packages are not available on Anaconda.org (e.g. a recent example is pytest-playwright). These packages are usually installed manually with pip after running doit develop_install. There should be a way to declare a list of packages that pyctdev should install with pip.

Regarding these two points, one could think that the packages to install only with conda and only with pip could be declared in a config file (e.g. in setup.cfg). However, I believe that for maximum flexibility it would actually be better to add command line parameters to pyctdev instead, as sometimes the packages to install depend on the operating system or on some other conditions. I would suggest something like doit develop_install --conda-install "nodejs>15" --conda-install mesalib --pip-install pytest-playwright --pip-install ....

Making it more predictable

What sometimes makes the installation process difficult to predict, and even not so robust, is the "channel dance" 💃 , whereby some packages get re-installed from another channel because of different channels being specified in the install steps. This was the source of a bad bug - that took months to find on HoloViews test suite as it happened only on a platform, and that still happens from time to time in the ecosystem - by which Python itself was being re-installed during a doit develop_install call, leading to a cryptic doit/pyctdev error.

One of the steps that I think is one source of this problem is the second step of doit env_create, the one that installs pyctdev. Because it doesn't re-use the channels passed to doit env_create, and because it chooses the channel to install pyctdev itself based on some rather implicit conditions. I would suggest that this step should install pyctdev with the channels provided to doit env_create, and adding a command line parameter to env_create to override the channel it should be installed from, e.g. doit env_create ... --pyctdev-channel "pyviz/label/dev". Note that in most HoloViz cases you wouldn't use that new parameter as either pyviz or pyviz/label/dev are specified in the channels list.

An approach that I have recently tried and that I find very appealing is to:

  1. create an empty environment: conda create -n my-env
  2. activate that environment: conda activate my-env
  3. configure the environment channels: conda config --env --append channels channel1 --append channels channel2
  4. configure the channel priority to strict: conda config --set channel_priority strict

This creates a local condaRC file associated with that environment. The benefits of this approach is that all the channels are declared prior to installing anything, in the order they are supposed to be used. Setting the channel priority to strict makes the environment solving even more predictable (and faster I believe). So the environment is set up, and the later conda install calls don't have to specify any channel at all. I think that this approach also offers a better separation between the user conda configuration, their system condaRC is less likely to leak its configuration during the installation procedure. Another situation that can benefit from this approach is in a local setup you want to download a new package or update a package. In that case you would do that using conda install directly, and you would have to remember the channels you should use and their order in order to avoid the channel dance. With the suggested approach you don't have to remember anything about the channels.

Suggestion

If I would combine all the suggestions I've made into a rather ambitious proposal, that would be extending doit env_create so that the following is allowed:

doit env_create \
--conda-mode=mamba \  # using mamba but would prefer libmamba
--name=my-env \
--pyctdev-install ">=1.1" \
--python=3.x \
-c pyviz/label/dev -c conda-forge -c nodefaults \
--channel-priority strict \
-o tests -o examples \
--conda-install "nodejs>15" \
--pip-install pytest-playwright

which would do:

  1. conda create -n my-env
  2. conda activate my-env
  3. conda config --env --append channels pyviz/label/dev --append channels conda-forge --append channels nodefaults
  4. conda config --set channel_priority strict
  5. mamba install python=3.x "pyctdev>=1.1" all the other tests and examples dependencies "nodejs>15"
  6. python -m pip install pytest-playwright
  7. python -m pip install --no-deps --no-build-isolation -e .

I would appreciate any feedback on this issue. If the last suggestion is too ambitious, implementing separately some of the first suggestions should already be an improvement. Note that I have not given any thought on the pip version of doit develop_install, which I think doesn't suffer from the issues reported here, at least not the performance related problems.

maximlt commented 1 year ago

An issue with having pyctdev run two conda install commands in doit develop_install is that sometimes Python itself gets up/down-graded with results in an error. This is actually reproducible, executing this:

conda create -n testenv python=3.8.10 doit`
conda activate testenv
doit reproduce

From a folder containing this dodo.py file:

# dodo.py

def task_reproduce():
    return {'actions': [
        'conda install --yes "python=3.8.8"',
        'echo "I will be reported as failing :("'
    ]}

Leads to this error:

TaskFailed - taskid:reproduce
Command failed: 'echo "I will be reported as failing :("' returned -9

Not that the action that is reported as failing is the second one that has nothing to do with Python. The error occurs maybe in between the two actions but isn't well reported.

This is actually a pretty important point as this error occurs quite often and the suggestion I made in my previous post doesn't take that into account.

maximlt commented 1 year ago

Ideally you would not install pyctdev (or any similar tool) in the environment you're testing or building, as its own dependencies may affect those of the software to test/build. It's even more true for pyctdev because of the bug reported above.

One may think that pyctdev could be installed in the base environment and as such would be made accessible to all the other environments:

  1. having users install tools in their base environment isn't necessarily a very good idea
  2. pyctdev currently need to be able to import setup.py and to do that the build dependencies (e.g. param. bokeh, setuptools, etc.) need to be installed in its environment

While we could accommodate with 1), 2) is a blocker. To unlock 2), we could convert all the projects to using pyproject.toml instead of setup.py.

analog-cbarber commented 1 year ago

Currently the pyctdev conda package depends on conda itself, which means that when you install pyctdev in any environment outside of the base, it will be install conda there as well which causes all sorts of confusion and problems.

Really, any tool that actually depends on having conda in the same environment really should not be installed anywhere except base.