Open jherland opened 2 months ago
conda-lock
can be used to create a fully-resolved lockfile from an environment.yml that only declares the direct dependencies, much like pip-tools' pip-compile
workflow. Here's a good article about it: https://pythonspeed.com/articles/conda-dependency-management/
Thanks! It's important for us to learn what tools are available and in use in the conda community. Also, it's good to see that there are even more tools to help make reproducible conda environments.
As far as FawltyDeps is concerned, the lockfiles produced by tools like conda-lock
or pip-compile
are not very interesting, as they are designed to capture the full closure of transitive dependencies, and FawltyDeps is only interested in you declaration of direct dependencies. (Passing such a lockfile to FawltyDeps will typically only generate a large list of unused - i.e. transitive - deps.)
When environment.yml
is generated by conda env export
(without --from-history
), I would consider it more of a lockfile than a manually curated declaration of direct dependencies (which really is what FawltyDeps is designed to work with).
Hence, for FawltyDeps to be useful in a conda project with environment.yml
, we want this file to only declare the direct dependencies, and not to be the product of conda env export
. Do you have a sense as to what is the common practice in conda projects here?
To be clear, this situation is somewhat similar to the situation with requirements.txt
files in many other Python projects:
Some projects manually curate their direct dependencies in a requirements.txt
file, and it is thus a valid input for FawltyDeps. Other projects will use a different file and run e.g. pip-compile
to generate a lockfile named requirements.txt
. We currently do not differentiate between these two cases, and we instead rely on the user pointing us to the declaration of direct dependencies with --deps
.
FawltyDeps is only interested in you declaration of direct dependencies
Right, I only mentioned conda-lock specifically to point out that, because conda-lock has been the standard tool for producing lockfiles from environment.yml files, it allows environment.yml files to declare only direct dependencies. So I think you can simplify this task by proceeding under the assumption that the user has an environment.yml where they intend to only declare direct dependencies, and is using a proper tool like conda-lock to create lockfiles from that (rather than other ways you mentioned users might be (mis-)managing dependencies, e.g. conda env export --from-history
, which is error-prone).
(It's too bad the conda docs you found don't mention conda-lock. That's either a significant oversight, or they're very out-of-date.)
(found while exploring potential Conda support for FawltyDeps, see e.g. #447 for more context)
I'm following the documentation at https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html to see what file formats conda uses to encode dependency declarations.
Specifically, the following sequence of commands:
yields the following
environment.yml
on my machine:As with #450, each dependency listed here does not follow the same format as used by pip's
requirements.txt
files, rather they seem to use their own Conda-specific format.The Conda documentation states that this
environment.yml
can be used to reproduce the Conda environment with this command:conda env create -f environment.yml
Given that I only have stated
requests
(andpython=3.8
) as my real dependencies in this environment, the above file does not reflect these direct/intentional dependencies, but instead appears to pin all (transitive) dependencies to specific versions + hashes. As such the above file is closer in essence to apoetry.lock
file than apyproject.toml
file.That said, the Conda documentation has this to say about creating an environment file that is portable across platforms:
Applied to my toy example above, this yields the following
environment.yml
file:This is clearly much closer to declaring the direct/intentional dependencies that we want to use as input to FawltyDeps.
The Conda documentation goes on to describe how to create an environment file manually, and this also yields a more minimal/appropriate file for FawltyDeps to use. In my toy example, it would look something like this:
I don't know how prevalent the
environment.yml
file is compared to the weirdrequirements.txt
files described in #450, but I suspect we should consider supporting both if we want to support Conda fully.Complications
Non-Python packages
Conda project dependencies will often include Python itself, along with non-Python dependencies. These must be properly ignored by FawltyDeps, but doing so correctly may require us to parse all dependencies, and then somehow consult a real Conda environment to deduce which of the dependencies actually provide Python import names or not.
To that end, there appear to be
.json
files in theconda-meta/
subdir of the Conda environment that list the files provided by a package, and from here we might be able to deduce which Conda packages correspond to Python packages (e.g. by looking forlib/pythonX.Y/site-packages/...
paths), which can then be further mapped into import names.Custom package sources
Unsurprisingly, Conda does not use PyPI to find Python packages, but rather has its own system of channels, including default channels and prioritization between channels in order to resolve conflicts.
When resolving Conda package names (and especially in conjunction with
--install-deps
) we would have to use/understand the same channel system to correctly map Conda dependencies into Conda packages (and from there -> Python packages -> import names).Possibly the only sensible choice here is to use/run Conda itself to either find an existing local environment - or establish one based on the
environment.yml
file - and then consult this environment to build our mapping.Non-obvious interactions with
pip
?It appears that Conda projects sometimes also use
pip
to manage some packages, and according to https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#using-pip-in-an-environment there are some things to be aware of when these tools are combined (see also https://conda.io/projects/conda/en/latest/user-guide/configuration/pip-interoperability.html). It remains to be seen how a combination of conda-installed and pip-installed dependencies can be best navigated and handled by FawltyDeps.