prefix-dev / pixi

Package management made easy
https://pixi.sh
BSD 3-Clause "New" or "Revised" License
3.44k stars 196 forks source link

pixi as a main development driver and `pixi env -e` behavior #2266

Open adrinjalali opened 1 month ago

adrinjalali commented 1 month ago

Problem description

Right now pixi as is, is great for reproducing work and environments and to be used in the CI. However, it's not suitable for everyday development and a part of that is a lack of ability to manipulate and persist an environment.

pixi env -e as is, activates an environment whose path includes a hash, which is quite confusing and not memorable, therefore not useful as remembering which env one is using. Also, the behavior of installing a package in that env, and then deactivating and reactivating it is somewhat unclear. There's --freeze which also doesn't make it clear what happens the next time if the user forgets to have that flag when activating and env.

It would be nice to have a way for using pixi that enables the developer to work with environments, somewhat similar to how one uses an environment w/o pixi.

cc @glemaitre

ruben-arts commented 1 month ago

Hi @adrinjalali,

I'm not sure what you are requesting, do you mean pixi shell --environment <environment_name> with pixi env.

Also I'm not aware of a --freeze flag in pixi.

I believe you want to be able to run pixi install in a pixi shell which would also re-activate the pixi shell, right?

glemaitre commented 1 month ago

I'll try to describe some potential use cases that we encountered when developing that require a bit more flexibility than the CI case.

One use case is related to being able to install a library without necessarily impacting the lock file.

Installing a dependency tracked in the lock file

When getting a bug fix report, one might need to switch the version of a dependency. The version is therefore not necessarily linked to a specific environment. One potential flow would be:

pixi shell -e dev
conda install scikit-learn~=1.4.0  # we might install via `pip` as well
pytest my_package

While deactivating/re-activating the environment, I would expect either (or both) of the following behaviors:

  1. Reinitialize the environment with package versions according to the lock file.
  2. Reactivate the previous environment (somehow, I would have expected --no-lockfile-update to do something like that reading the docstring).

Current state:

Currently, it looks like you can have conda or pip as dependencies and thus trigger an install. However, I'm not really sure that this is intended. Indeed, while it looks like reactivating the environment triggers case 1 above, it looks like we break some metadata because triggering any new install will also trigger previous package changes. For instance, if we intend to execute a new install in a new shell such as:

pixi shell -e dev
conda install scipy~=1.13.1

We will also get a downgrade of:

The following packages will be DOWNGRADED:

  scikit-learn                        1.5.2-py311h9e23f0f_1 --> 1.4.2-py311hbfb48bc_1
  scipy                              1.14.1-py311h2929bc6_0 --> 1.13.1-py311hceeca8c_0

Installing a new dependency without tracking it in the lock file

Sometimes, one would like to install a dependency that should not be tracked with the lock file, e.g., ipython.

Current state:

Similarly to above, you can intend to install an additional package if pip or conda is a dependency.

pixi shell -e dev
conda install ipython

The interesting part here is that while installed with conda, after reactivating the environment, the package is still present and noted as a pypi package.

Installing several packages from source

Sometimes, we end up installing several libraries from source and switching branches locally. While pixi looks more adequate to handle per-project dependencies, I'm wondering if you have any vision on how to handle ecosystem dependencies.

From my thoughts, it could look like something as proposed here: https://github.com/rgommers/pixi-dev-scipystack but where I would expect to have a top-level pixi.toml where I could easily switch versions. I did not yet have time to play with such a setting, though.

Conclusions

I completely agree that any of the use cases above is just stretching the scope of what pixi currently is, but we thought that we could bring back some feedback to know what you think about it.

ruben-arts commented 1 month ago

Thank you for the great write up @glemaitre. This is really good food for thought! I'll definitely pass it to the team!

We've been pretty conservative with features that allow you to create "broken" environments but "Pixi is build for development" so we need to find a way to serve as much users as possible.

I would like to get some users stories from your input. Would these be correct:

If these stories fit the need, we can start brainstorming implementation ideas.

adrinjalali commented 1 month ago

Those user stories sound good to me. What I would add is:

ruben-arts commented 1 month ago

@adrinjalali, just checking, but doesn't your command prompt tell you the environment you are in? E.g. it looks like this for me: Image

glemaitre commented 1 month ago

I have to double check my zsh config but I don' get the right info:

Image

@adrinjalali might actually be a bug :).

Edit: So it seems that there is some side-effect when base from miniforge is activated and that we activate a shell with pixi. If deactivating completely miniforge, then I get the proper information.

As a pixi user I want to install an single package into a given environment without modifying the lockfile, and pixi should leave that environment modification alone until it tell it to change, so that I can test with that specific package.

I would amend slightly this user story mentioning that I have the option to "leave the environment or to completely reset it".

Otherwise, both user stores look the good.

glemaitre commented 1 month ago

As a pixi user I want to install an single package into a given environment without modifying the lockfile, and pixi should leave that environment modification alone until it tell it to change, so that I can test with that specific package.

I would amend slightly this user story mentioning that I have the option to "leave the environment or to completely reset it".

Otherwise, both user stores look the good.

@adrinjalali might actually be a bug :). I'll investigate more.

rgommers commented 1 month ago

I've run into these workflow issues as well, but have come to different conclusions. To start with, an observation that if you're trying to use a conda-like workflow with imperative temporary modifications to an environment, you're really trying to fight one of the most fundamental design decisions in Pixi - from https://pixi.sh/latest/features/environment/#structure:

Pixi will always make sure the environment is in sync with the pixi.lock file. If this is not the case then all the commands that use the environment will automatically update the environment, e.g. pixi run, pixi shell.

I'd say that the real user story for the first two items is actually not "let me do the conda-like thing" but "How can I change a version of a dependency or temporarily add a new dependency easily, in order to effectively debug an issue reported against my library for that specific version of the dependency?"

I've so far found it not too onerous to just modify pixi.toml to the exact dependency version needed, and then simply not commit that modification. Doesn't that seem better than trying something like pixi shell -e dev && conda install scikit-learn~=1.4.0?


Taking a step back, there are two types of debugging exercises here I think:

  1. It's possible to (easily) declaratively describe the desired state of an environment
  2. One really needs to do something hacky and non-reproducible (e.g., apply a patch to some installed version of a dependency).

The first two examples in @glemaitre's first post are of type (1). I'd say that Pixi currently supports this fine; working declaratively isn't really harder than working imperatively in a throw-away conda env with conda/mamba/micromamba.

The third example (install multiple package from source) is harder, as is my "apply a patch" example. For the former, perhaps the workspace concept can help once that lands. For the inherently type (2) cases though, I've concluded that that is where I still need to keep one of conda/mamba/micromamba around. That isn't even a problem, except for one thing: pixi tasks can no longer be used, because they will activate another environment. I can use spin just fine; if the logic you need is actually in a pixi task though, it could be quite handy to have a pixi run foo --no-env type flag.

glemaitre commented 1 month ago

Thanks @rgommers for the feedback. I second the reformulation :)

I'd say that Pixi currently supports this fine; working declaratively isn't really harder than working imperatively in a throw-away conda env with conda/mamba/micromamba

I think that I slightly disagree with a "it depends". If I'm the target user then I completely agree because I know how to use pixi (or enough to get around) and find my way around the gigantic pyproject.toml file. If the target user is however someone new to open-source or the project, then the entry barrier is still high because we request to know git + some tooling. The imperative way would be more familiar because on how you get introduced to the Python stack.

Apart from this point, I'm really aligned with @rgommers thought and thinking that the "workspace" concept might seem the most natural way to deal with multiple from-source install.

rgommers commented 1 month ago

If the target user is however someone new to open-source or the project, [..] The imperative way would be more familiar because on how you get introduced to the Python stack.

Yeah fair enough, newer devs will struggle. I'd say both are probably fairly hard for newcomers though. The imperative way is to do conda install scikit-learn=1.4.0 and then try to go back to the old state with a next conda install scikit-learn=1.x.y. That doesn't work though, and environments used that way degrade over time. There's actually no good way to prevent that, because the imperative way is non-reversible, unlike the pixi way. So the old way you just end up YOLO'ing until you have to just delete and recreate the env. I'd argue it's cleaner/better to do it the pixi way, and address the lack of familiarity through good docs that addressing the user story explicitly.

and find my way around the gigantic pyproject.toml file.

That's hard indeed. I'd document it only via pixi add though, not via manual edits. You can update an existing dependency fairly easily that way. The most error-prone part is probably the "don't commit these changes and know how to undo them" git fluency issue.

adrinjalali commented 1 month ago

Something that makes me quite uneasy about this is that doing a pixi add kinda thing, would result in something which might end up in the repo / PR. A workflow editing environments shouldn't modify things on the repo.

There's this coupling of files on the repo and changes to the local workspace env which I'd like not to have.

ruben-arts commented 1 month ago

There's this coupling of files on the repo and changes to the local workspace env which I'd like not to have.

This is kind of the main idea of the pixi design, unless you know exactly what you're doing it's often easier to ctrl+z a manifests/lock-file then it is to revert installed environments. I understand the feeling and have had similar feedback before (when we build a similar tool). So though I believe we should look into making pixi as comfortable as possible I want to really guard against implementing "foot guns" that can create broken environments without a clear input.

adrinjalali commented 1 month ago

People working on the same project, have vastly different environments, and that's a good thing. For instance, I might want to have mkl and another person might not even have the hardware to support it, or simply doesn't want to bother with downloading that huge package.

Or I might not want to bother with a lot of optional dependencies and have those tests skipped, while others might be working on them.

Having different environments on different contributor machines is a good thing, since it uses the diversity of the contributor base to test the code base against a larger pool of potential environments out there used by our end users.

This means the environments shouldn't be strictly dictated by the repo itself, and contributors should have flexibility on what they do with their envs.

This means we'd need to not have a pixi config on the repo and have it git ignored. Or have a way to ignore the repo's pixi config and each user to have a pixi config per project in their home folder somewhere and tell pixi to use those instead. This doesn't sound ideal to me.

rgommers commented 1 month ago

People working on the same project, have vastly different environments, and that's a good thing

It's not unambiguously a good thing, it has important pros and cons. A pro is what you say: more testing in varied envs. Cons are: it's much harder to set up such varied envs or write good contributor docs for it, for newer contributors there's more breakage when all they want to do is work on a pure Python PR, etc.

I think Pixi fixing the cons above is a big gain. There will still be users wanting to use venvs, or conda/mamba, or spack, or docker, or whatever, so the pro you describe won't disappear completely just by adopting Pixi.

My recommendation would be to set things up so that using Pixi is a good default for newcomers to your project, but optional. Turning Pixi into working just like Conda is kinda what you're asking for, but can't really be the right answer.

I actually quite like the pixi and spin layering, it seems to work well for working on NumPy and SciPy. I can default to pixi, and when I hit something that requires a conda-style activated environment, I activate a conda env and simply invoke task names with spin (e.g., I then do spin build rather than pixi run build). What it requires is keeping the bulk of the task logic in spin, and make the pixi tasks a thin one-liner that calls the corresponding spin command.

adrinjalali commented 1 month ago

I guess a difference between numpy and scikit-learn for instance, is that in numpy you don't really "install" the package in an environment, and you always have to do something (setting some paths) to use it, which spin does for you. In scikit-learn, we have a normal editable installation of the package, and we run all commands like pytest, sphinx, etc easily as usual. So we actually use the environment to deal with the development.

When I work on numpy, I always have to run spin, and at the point that becomes your workflow, then changing spin to pixi and having it handle your environment as well doesn't change much I guess. For me, I do a lot of simple python and ipython manual testing in REPL, and having to always do spin python or pixi python is an inconvenience.

That is not to say that I think pixi has to support my workflow. It's only to say "if pixi wants to be a default tool for developers like me" then it needs to support what I need.

bollwyvl commented 1 month ago

thin one-liner that calls the corresponding spin command

Yeah, this is a really key observation, and presently the only feasible way for large monorepos that want pixi's environment/task ordering guarantees and work saving, but want isolated/reusable/verifiable task definitions, without 10k lines of toml with embedded, difficult to lint/verify bash-like stuff (no offense to the task shell, of course).

different environments on different contributor machines

I think pixi's huge boon is making replicable (let's not throw Reproducible around just yet) environments a matter of course. Having a declarative way to encapsulate an "off-the-checked-in path" of what was done to a local environment would be useful.

An approach I have seen (but not necessarily liked) in other tools is to have a well-known, never-to-be-checked-in file that gets deeply merged with the source of truth, which results in a parallel structure for "dirty" envs.

Running commands with that in place (or explicitly named as CLI args, or environment variable) would result in a {whatever}.pixi.lock, used for all commands, and build up .pixi/{whatever}/envs and tasks-cache to actually run things, and lots of warnings, e.g. ✨ DIRTY Pixi task.

As to where to put it:

Having these result in real, neighbor files makes it relatively easy to see what that difference would be via simple diff commands, but pixi could also summarize them in a helpful way.