Open adrinjalali opened 1 month ago
Hi @adrinjalali,
I'm not sure what you are requesting, do you mean pixi shell --environment <environment_name>
with pixi env
.
Also I'm not aware of a --freeze
flag in pixi.
I believe you want to be able to run pixi install
in a pixi shell
which would also re-activate the pixi shell
, right?
I'll try to describe some potential use cases that we encountered when developing that require a bit more flexibility than the CI case.
One use case is related to being able to install a library without necessarily impacting the lock file.
When getting a bug fix report, one might need to switch the version of a dependency. The version is therefore not necessarily linked to a specific environment. One potential flow would be:
pixi shell -e dev
conda install scikit-learn~=1.4.0 # we might install via `pip` as well
pytest my_package
While deactivating/re-activating the environment, I would expect either (or both) of the following behaviors:
--no-lockfile-update
to do something like that reading the docstring).Current state:
Currently, it looks like you can have conda
or pip
as dependencies and thus trigger an install. However, I'm not really sure that this is intended. Indeed, while it looks like reactivating the environment triggers case 1 above, it looks like we break some metadata because triggering any new install will also trigger previous package changes. For instance, if we intend to execute a new install in a new shell such as:
pixi shell -e dev
conda install scipy~=1.13.1
We will also get a downgrade of:
The following packages will be DOWNGRADED:
scikit-learn 1.5.2-py311h9e23f0f_1 --> 1.4.2-py311hbfb48bc_1
scipy 1.14.1-py311h2929bc6_0 --> 1.13.1-py311hceeca8c_0
Sometimes, one would like to install a dependency that should not be tracked with the lock file, e.g., ipython
.
Current state:
Similarly to above, you can intend to install an additional package if pip
or conda
is a dependency.
pixi shell -e dev
conda install ipython
The interesting part here is that while installed with conda
, after reactivating the environment, the package is still present and noted as a pypi
package.
Sometimes, we end up installing several libraries from source and switching branches locally. While pixi
looks more adequate to handle per-project dependencies, I'm wondering if you have any vision on how to handle ecosystem dependencies.
From my thoughts, it could look like something as proposed here: https://github.com/rgommers/pixi-dev-scipystack but where I would expect to have a top-level pixi.toml
where I could easily switch versions. I did not yet have time to play with such a setting, though.
I completely agree that any of the use cases above is just stretching the scope of what pixi
currently is, but we thought that we could bring back some feedback to know what you think about it.
Thank you for the great write up @glemaitre. This is really good food for thought! I'll definitely pass it to the team!
We've been pretty conservative with features that allow you to create "broken" environments but "Pixi is build for development" so we need to find a way to serve as much users as possible.
I would like to get some users stories from your input. Would these be correct:
If these stories fit the need, we can start brainstorming implementation ideas.
Those user stories sound good to me. What I would add is:
@adrinjalali, just checking, but doesn't your command prompt tell you the environment you are in? E.g. it looks like this for me:
I have to double check my zsh config but I don' get the right info:
@adrinjalali might actually be a bug :).
Edit: So it seems that there is some side-effect when base
from miniforge is activated and that we activate a shell with pixi
. If deactivating completely miniforge, then I get the proper information.
As a pixi user I want to install an single package into a given environment without modifying the lockfile, and pixi should leave that environment modification alone until it tell it to change, so that I can test with that specific package.
I would amend slightly this user story mentioning that I have the option to "leave the environment or to completely reset it".
Otherwise, both user stores look the good.
As a pixi user I want to install an single package into a given environment without modifying the lockfile, and pixi should leave that environment modification alone until it tell it to change, so that I can test with that specific package.
I would amend slightly this user story mentioning that I have the option to "leave the environment or to completely reset it".
Otherwise, both user stores look the good.
@adrinjalali might actually be a bug :). I'll investigate more.
I've run into these workflow issues as well, but have come to different conclusions. To start with, an observation that if you're trying to use a conda-like workflow with imperative temporary modifications to an environment, you're really trying to fight one of the most fundamental design decisions in Pixi - from https://pixi.sh/latest/features/environment/#structure:
Pixi will always make sure the environment is in sync with the pixi.lock
file. If this is not the case then all the commands that use the environment will automatically update the environment, e.g. pixi run
, pixi shell
.
I'd say that the real user story for the first two items is actually not "let me do the conda-like thing" but "How can I change a version of a dependency or temporarily add a new dependency easily, in order to effectively debug an issue reported against my library for that specific version of the dependency?"
I've so far found it not too onerous to just modify pixi.toml
to the exact dependency version needed, and then simply not commit that modification. Doesn't that seem better than trying something like pixi shell -e dev && conda install scikit-learn~=1.4.0
?
Taking a step back, there are two types of debugging exercises here I think:
The first two examples in @glemaitre's first post are of type (1). I'd say that Pixi currently supports this fine; working declaratively isn't really harder than working imperatively in a throw-away conda env with conda
/mamba
/micromamba
.
The third example (install multiple package from source) is harder, as is my "apply a patch" example. For the former, perhaps the workspace concept can help once that lands. For the inherently type (2) cases though, I've concluded that that is where I still need to keep one of conda
/mamba
/micromamba
around. That isn't even a problem, except for one thing: pixi tasks can no longer be used, because they will activate another environment. I can use spin
just fine; if the logic you need is actually in a pixi task though, it could be quite handy to have a pixi run foo --no-env
type flag.
Thanks @rgommers for the feedback. I second the reformulation :)
I'd say that Pixi currently supports this fine; working declaratively isn't really harder than working imperatively in a throw-away conda env with conda/mamba/micromamba
I think that I slightly disagree with a "it depends". If I'm the target user then I completely agree because I know how to use pixi
(or enough to get around) and find my way around the gigantic pyproject.toml
file. If the target user is however someone new to open-source or the project, then the entry barrier is still high because we request to know git
+ some tooling. The imperative way would be more familiar because on how you get introduced to the Python stack.
Apart from this point, I'm really aligned with @rgommers thought and thinking that the "workspace" concept might seem the most natural way to deal with multiple from-source install.
If the target user is however someone new to open-source or the project, [..] The imperative way would be more familiar because on how you get introduced to the Python stack.
Yeah fair enough, newer devs will struggle. I'd say both are probably fairly hard for newcomers though. The imperative way is to do conda install scikit-learn=1.4.0
and then try to go back to the old state with a next conda install scikit-learn=1.x.y
. That doesn't work though, and environments used that way degrade over time. There's actually no good way to prevent that, because the imperative way is non-reversible, unlike the pixi way. So the old way you just end up YOLO'ing until you have to just delete and recreate the env. I'd argue it's cleaner/better to do it the pixi
way, and address the lack of familiarity through good docs that addressing the user story explicitly.
and find my way around the gigantic
pyproject.toml
file.
That's hard indeed. I'd document it only via pixi add
though, not via manual edits. You can update an existing dependency fairly easily that way. The most error-prone part is probably the "don't commit these changes and know how to undo them" git
fluency issue.
Something that makes me quite uneasy about this is that doing a pixi add
kinda thing, would result in something which might end up in the repo / PR. A workflow editing environments shouldn't modify things on the repo.
There's this coupling of files on the repo and changes to the local workspace env which I'd like not to have.
There's this coupling of files on the repo and changes to the local workspace env which I'd like not to have.
This is kind of the main idea of the pixi design, unless you know exactly what you're doing it's often easier to ctrl+z
a manifests/lock-file then it is to revert installed environments. I understand the feeling and have had similar feedback before (when we build a similar tool). So though I believe we should look into making pixi as comfortable as possible I want to really guard against implementing "foot guns" that can create broken environments without a clear input.
People working on the same project, have vastly different environments, and that's a good thing. For instance, I might want to have mkl
and another person might not even have the hardware to support it, or simply doesn't want to bother with downloading that huge package.
Or I might not want to bother with a lot of optional dependencies and have those tests skipped, while others might be working on them.
Having different environments on different contributor machines is a good thing, since it uses the diversity of the contributor base to test the code base against a larger pool of potential environments out there used by our end users.
This means the environments shouldn't be strictly dictated by the repo itself, and contributors should have flexibility on what they do with their envs.
This means we'd need to not have a pixi config on the repo and have it git ignored. Or have a way to ignore the repo's pixi config and each user to have a pixi config per project in their home folder somewhere and tell pixi to use those instead. This doesn't sound ideal to me.
People working on the same project, have vastly different environments, and that's a good thing
It's not unambiguously a good thing, it has important pros and cons. A pro is what you say: more testing in varied envs. Cons are: it's much harder to set up such varied envs or write good contributor docs for it, for newer contributors there's more breakage when all they want to do is work on a pure Python PR, etc.
I think Pixi fixing the cons above is a big gain. There will still be users wanting to use venvs, or conda/mamba, or spack, or docker, or whatever, so the pro you describe won't disappear completely just by adopting Pixi.
My recommendation would be to set things up so that using Pixi is a good default for newcomers to your project, but optional. Turning Pixi into working just like Conda is kinda what you're asking for, but can't really be the right answer.
I actually quite like the pixi
and spin
layering, it seems to work well for working on NumPy and SciPy. I can default to pixi
, and when I hit something that requires a conda
-style activated environment, I activate a conda env and simply invoke task names with spin
(e.g., I then do spin build
rather than pixi run build
). What it requires is keeping the bulk of the task logic in spin
, and make the pixi
tasks a thin one-liner that calls the corresponding spin
command.
I guess a difference between numpy and scikit-learn for instance, is that in numpy you don't really "install" the package in an environment, and you always have to do something (setting some paths) to use it, which spin
does for you. In scikit-learn, we have a normal editable installation of the package, and we run all commands like pytest
, sphinx
, etc easily as usual. So we actually use the environment to deal with the development.
When I work on numpy, I always have to run spin
, and at the point that becomes your workflow, then changing spin
to pixi
and having it handle your environment as well doesn't change much I guess. For me, I do a lot of simple python
and ipython
manual testing in REPL, and having to always do spin python
or pixi python
is an inconvenience.
That is not to say that I think pixi
has to support my workflow. It's only to say "if pixi
wants to be a default tool for developers like me" then it needs to support what I need.
thin one-liner that calls the corresponding
spin
command
Yeah, this is a really key observation, and presently the only feasible way for large monorepos that want pixi
's environment/task ordering guarantees and work saving, but want isolated/reusable/verifiable task definitions, without 10k lines of toml with embedded, difficult to lint/verify bash-like stuff (no offense to the task shell, of course).
different environments on different contributor machines
I think pixi
's huge boon is making replicable (let's not throw Reproducible around just yet) environments a matter of course. Having a declarative way to encapsulate an "off-the-checked-in path" of what was done to a local environment would be useful.
An approach I have seen (but not necessarily liked) in other tools is to have a well-known, never-to-be-checked-in file that gets deeply merged with the source of truth, which results in a parallel structure for "dirty" envs.
Running commands with that in place (or explicitly named as CLI args, or environment variable) would result in a {whatever}.pixi.lock
, used for all commands, and build up .pixi/{whatever}/envs
and tasks-cache
to actually run things, and lots of warnings, e.g. ✨ DIRTY Pixi task
.
As to where to put it:
pixi
already builds a .pixi
folder, putting it there (with the same name) might make sense, e.g. .pixi/pixi.toml
.pixi
for concise .gitignore
of .pixi*
, e.g.
.pixi.toml
(simplest, but maybe too simple).pixi.tmp.toml
(kinda looks like an editor-generated file).pixi.local.toml
[project.{whatever}]
(now with the same naming problem)--{whatever}-manifest-path
, probably needed anyway for pixi add
, etc.Having these result in real, neighbor files makes it relatively easy to see what that difference would be via simple diff
commands, but pixi
could also summarize them in a helpful way.
Problem description
Right now
pixi
as is, is great for reproducing work and environments and to be used in the CI. However, it's not suitable for everyday development and a part of that is a lack of ability to manipulate and persist an environment.pixi env -e
as is, activates an environment whose path includes a hash, which is quite confusing and not memorable, therefore not useful as remembering which env one is using. Also, the behavior of installing a package in that env, and then deactivating and reactivating it is somewhat unclear. There's--freeze
which also doesn't make it clear what happens the next time if the user forgets to have that flag when activating and env.It would be nice to have a way for using
pixi
that enables the developer to work with environments, somewhat similar to how one uses an environment w/opixi
.cc @glemaitre