Closed knthmn closed 1 year ago
Please tell me your opinion, if you like the idea I can work on it.
Hi @knthmn . Thanks for the proposal. I've long bemoaned the sluggish performance, but put up with it for the simplicity of the implementation. In particular, one of the big advantages of pip-run is that it's stateless, so leaves little behind to be cleaned up.
I've similarly thought about ways pip-run could somehow optimize the performance.
The design you describe aligns closely with what I would expect. I'd tweak it slightly thus:
$XDG_CACHE_HOME/pip-run/XXXX
), rather than relying on temp.--cache
, I'd suggest --reuse
or --persist
, which more closely aligns with the pip-run
usage and is more distinct from pip install
parameters. Although,pip-rerun
or pip-sprint
.I do think producing a hash of the dependency list may prove more difficult than it sounds on two dimensions:
pip-run requests
today may pull one version, whereas pip-run requests
tomorrow could pull a newer (or even older in case of a yank/delete) version. One of the advantages of pip-run
is that its invocation is fairly independent of the environment. Adding this state could make it more difficult to anticipate what the behavior of pip-run
would be for a given user's environment.pip-run
doesn't resolve dependencies, but only passes them to pip install
. So pip-run -r requirements.txt
is very different depending on the context. I wouldn't expect pip-run
to have the same behavior if the contents of that file changed or if the current directory changed.I'd like some thoughts on how you'd propose to address those concerns.
If we can come up with some reasonable behaviors for these concerns and come up with an implementation that is fairly clean (doesn't introduce too many touch points), I'd be inclined to move forward with it.
Thank you for the response.
$XDG_CACHE_HOME
and --persist
useradd
and usermod
) since I think they pollute the command namespace and hurt discoverability. As for the identity of an environment
requests
, then it would keep using the same environment whether requests
is updated or not. This roughly corresponds pip install -r requirements.txt
without specifying the exact version. requests==<some_version>
to prevent their scripts from breaking. This would create another environment from requests
even if they happen to resolve to the same version. I wanted to have this because if each script has their own environment, their size can bubble up fairly quickly. I have also thought of linking the dependencies but it sounds too complicated and might not even work.
I wonder - does pipx run
achieve what you're seeking here? I wonder if it's better to let pipx handle more permanent environments and pip-run to always handle ephemeral ones.
I would also like this feature. For my use cases, I would be fine with identifying environments by the requirements as stated by the user. I agree that managing environments is hard, but to be honest, that's why I want the tool to handle it for me. I'd be perfectly OK with a simple initial implementation, with improvements being added based on real-world experience. (I can list off things I'd like to have, but I don't know in advance how necessary they are, and if the list is too long I imagine it would simply make the feature look too complex to accept).
For my own use cases (typically pip-run SCRIPT
with the dependencies defined in the script) I don't see how pipx run
would help, as it needs the script to be packaged into a standard distribution. Although thinking a bit "outside the box", if there was a tool that built a simple wheel from a script (with embedded dependencies), without needing a full "project directory" then that might make "packaging the script" simple enough to work smoothly with pipx run
. I'll be back, I'm going to see if I can make something like that 🙂
@pfmoore that actually isn't a terrible idea - what if we could extend pipx
with a plugin system that lets the user define different kinds of spec (not sure if this should be --spec
). E.g.
# TOML below
# dependencies: ["numpy", ...]
import numpy
The pipx
part is that these plugins would just take a spec and build a wheel, which is then given to pipx
, meaning that the syntax is a per-plugin thing.
@knthmn I updated your comment to replace -q requests
with simply requests
. The -q
is a separate, unrelated parameter that just means "be quiet when installing" (and in pip-run 9, is unnecessary).
I still don't think there's a good answer to the concern about how to cache an environment if requirements are passed as a requirements file. Let me illustrate:
draft $ cat > requirements.txt
tempora
draft $ pip-run -q -r requirements.txt -- -c pass
draft $ cat > requirements.txt
numpy
draft $ pip-run -q -r requirements.txt -- -c pass
draft $
In those two invocations, the set of dependencies available in each environment is very different, but the parameters to run-pip are identical. Moreover, the behavior of run-pip is identical. The fact that requirements.txt
changed between invocations only affects the underlying pip install
call.
So the question is - if pip-run is to cache environments based on the inputs from the user, how does it distinguish the first invocation from the second? I see a few options:
numpy
would be missing in the second invocation).-r/--requirement
) and integrate those requirements with other specified requirements. As pip-run doesn't currently parse requirement files, it will need to implement that behavior and the integration logic, duplicating as closely as possible the behavior found (but not exposed) in pip.I'm leaning toward the last option.
I have a number of scripts for my personal use and I found
pip-run
very useful for managing my dependencies. However it still takes several seconds forpip
to install the packages, even if it already uses the download cache ofpip
.I propose adding an option (e.g.
--cache
) where the directory containing the packages is not deleted after the program exits, and it can be reused as long as the dependencies list does not change. Similar tools (e.g.kotlin-main-kts
) also have caching that makes rerunning a script practically free.There are several things that need to be considered
numpy
takes up around 70MiB). A simple solution can be keep using/tmp
and let the system manages it.I think it is best to keep the functionality simple, and in the worst case just destroy and recreate the environment. But I do expect it to be able to avoid recreation of the environment if I run the same script twice.