jaraco / pip-run

pip-run - dynamic dependency loader for Python
MIT License
137 stars 19 forks source link

Environment caching #52

Closed knthmn closed 1 year ago

knthmn commented 3 years ago

I have a number of scripts for my personal use and I found pip-run very useful for managing my dependencies. However it still takes several seconds for pip to install the packages, even if it already uses the download cache of pip.

I propose adding an option (e.g. --cache) where the directory containing the packages is not deleted after the program exits, and it can be reused as long as the dependencies list does not change. Similar tools (e.g. kotlin-main-kts) also have caching that makes rerunning a script practically free.

There are several things that need to be considered

I think it is best to keep the functionality simple, and in the worst case just destroy and recreate the environment. But I do expect it to be able to avoid recreation of the environment if I run the same script twice.

knthmn commented 3 years ago

Please tell me your opinion, if you like the idea I can work on it.

jaraco commented 3 years ago

Hi @knthmn . Thanks for the proposal. I've long bemoaned the sluggish performance, but put up with it for the simplicity of the implementation. In particular, one of the big advantages of pip-run is that it's stateless, so leaves little behind to be cleaned up.

I've similarly thought about ways pip-run could somehow optimize the performance.

The design you describe aligns closely with what I would expect. I'd tweak it slightly thus:

I do think producing a hash of the dependency list may prove more difficult than it sounds on two dimensions:

I'd like some thoughts on how you'd propose to address those concerns.

If we can come up with some reasonable behaviors for these concerns and come up with an implementation that is fairly clean (doesn't introduce too many touch points), I'd be inclined to move forward with it.

knthmn commented 3 years ago

Thank you for the response.

As for the identity of an environment

I wanted to have this because if each script has their own environment, their size can bubble up fairly quickly. I have also thought of linking the dependencies but it sounds too complicated and might not even work.

jaraco commented 2 years ago

I wonder - does pipx run achieve what you're seeking here? I wonder if it's better to let pipx handle more permanent environments and pip-run to always handle ephemeral ones.

pfmoore commented 2 years ago

I would also like this feature. For my use cases, I would be fine with identifying environments by the requirements as stated by the user. I agree that managing environments is hard, but to be honest, that's why I want the tool to handle it for me. I'd be perfectly OK with a simple initial implementation, with improvements being added based on real-world experience. (I can list off things I'd like to have, but I don't know in advance how necessary they are, and if the list is too long I imagine it would simply make the feature look too complex to accept).

For my own use cases (typically pip-run SCRIPT with the dependencies defined in the script) I don't see how pipx run would help, as it needs the script to be packaged into a standard distribution. Although thinking a bit "outside the box", if there was a tool that built a simple wheel from a script (with embedded dependencies), without needing a full "project directory" then that might make "packaging the script" simple enough to work smoothly with pipx run. I'll be back, I'm going to see if I can make something like that 🙂

agoose77 commented 2 years ago

@pfmoore that actually isn't a terrible idea - what if we could extend pipx with a plugin system that lets the user define different kinds of spec (not sure if this should be --spec). E.g.

# TOML below
# dependencies: ["numpy", ...]

import numpy

The pipx part is that these plugins would just take a spec and build a wheel, which is then given to pipx, meaning that the syntax is a per-plugin thing.

jaraco commented 1 year ago

@knthmn I updated your comment to replace -q requests with simply requests. The -q is a separate, unrelated parameter that just means "be quiet when installing" (and in pip-run 9, is unnecessary).

I still don't think there's a good answer to the concern about how to cache an environment if requirements are passed as a requirements file. Let me illustrate:

 draft $ cat > requirements.txt
tempora
 draft $ pip-run -q -r requirements.txt -- -c pass
 draft $ cat > requirements.txt
numpy
 draft $ pip-run -q -r requirements.txt -- -c pass
 draft $

In those two invocations, the set of dependencies available in each environment is very different, but the parameters to run-pip are identical. Moreover, the behavior of run-pip is identical. The fact that requirements.txt changed between invocations only affects the underlying pip install call.

So the question is - if pip-run is to cache environments based on the inputs from the user, how does it distinguish the first invocation from the second? I see a few options:

I'm leaning toward the last option.