paylogic / pip-accel

pip-accel: Accelerator for pip, the Python package manager
https://pypi.python.org/pypi/pip-accel
MIT License
308 stars 35 forks source link

Why does `pip-accel` make an sdist (?) archive when I `pip-accel install -e .`? #72

Open mrmachine opened 8 years ago

mrmachine commented 8 years ago

I want to use pip-accel to install and cache binary archives for the dependencies listed in ./setup.py, but I don't want or need to build or cache an archive for . -- I'm only installing it to get the dependencies, and to add . to the Python path. There are a lot of files in . (e.g. 1GB temporary SQL files in my working copy), and pip-accel seems to want to include these in the archive, too. It hangs for a long long time (longer than most people can be bothered to wait, assuming it has hung completely), on Obtaining file://...

xolox commented 8 years ago

Hi Tai Lee, thanks for the feedback and sorry to hear that pip-accel is giving you trouble!

Disclaimer: Let me preface all of this by explaining that when I created pip-accel I didn't intend for it to support editable installations at all (it simply wasn't on my radar). Users requested proper support for this feature and adding it was rather trivial based on integration with older versions of pip. That situation has since changed unfortunately, but I've tried to preserve compatibility with older versions of pip-accel which is why there is now "suboptimal" support for editable installations (I say "suboptimal" because it works fine for some use cases but rather badly for other use cases, as you found out in #71).

Analysis: I'm aware of this behavior and the problems it causes (as you describe) and would love to change it but don't really know how to, given the way that pip and pip-accel integrate. I actually (fairly frequently) hit the same annoyance that you describe in your last sentence when I install an editable checkout of one of my projects and pip-accel causes pip to process a multi-hundred-megabyte .tox directory that's located inside the checkout. In other words, I feel your frustration and have previously investigated whether it's possible to fix this rather annoying consequence of how pip and pip-accel integrate (unfortunately I didn't come very far, or I would have fixed it already :-)).

Your previous bug report (#71) and this one share the same root cause, however I understand why you reported this as a separate issue; users of pip-accel shouldn't need to know about the internals of the tool at all. I intend to investigate if the behavior described in #71 can be improved and that will likely touch the same root cause and so possibly fix this issue as well.

Short term workarounds: In the short term all I can do is suggest a couple of temporary workarounds that may or may not be satisfactory for you, depending on your use case (which I am not familiar with):

Assuming you have access to a requirements file that lists the dependencies of the package/project that you're working on, you can pip-accel install --requirement $REQ_FILE and then pip install --editable $CHECKOUT to cache your dependencies while sidestepping the issues described here and in #71. Of course this requires that 1) the project provides a requirements file and 2) you have access to the requirements file.

If the project doesn't provide a requirements file but you have write access to or even control over the project, you may consider adding a requirements file and modifying the setup.py script to fill in the install_requires argument based on the requirements file (to avoid duplication).

Of course I completely understand if these workarounds are going rather far and you may as well consider abandoning your efforts to utilize pip-accel; I'm just trying to think of short term workarounds, I'm not trying to convince you to apply them.

mrmachine commented 8 years ago

I am actually already installing a requirements.txt file first with pip-accel, then doing pip install -e ., just as you describe. However, the problem is that the requirements.txt file is not always complete (sometimes even empty) or up to date, and in those cases (e.g. when starting a new project from a template), all dependencies or all outdated dependencies will be installed by pip, slowly.

Usually, this won't be a big deal. One person will create a project and install unpinned dependencies once, then pin them. But it means we have to make sure that everyone on the team who ever makes a project from the template remembers to pin dependencies sooner than later, and it otherwise wouldn't normally happen until the project was closer to an initial release.

But pip-accel is still definitely useful when the requirements.txt file is up to date.

xolox commented 8 years ago

Thanks for the reply and good to hear about your use case. Your use of pip-accel can still be beneficial if the majority of the requirements are already part of the requirements file, but I can see how this is not exactly ideal during development :-).


The root cause of all this (how pip and pip-accel integrate)

I was writing a follow up with more details when I received your reply, so I'll just post it here now:

In issue #71 and this issue I keep mentioning "the way that pip and pip-accel integrate" and how this makes it hard to fix these issues related to editable installations. I thought it might be useful for you and others reading along to get an impression of what it is that actually causes these issues:

  1. pip-accel uses pip to transform the command line arguments into a requirement set of unpacked source and/or wheel distributions, preferably without connecting to the internet (using the --no-index option) but if needed (when some dependencies need to be downloaded) falling back to a rerun without the --no-index option. I don't see how I could move away from this approach without (badly) re-implementing features like the parsing of requirements files, fetching of distribution archives from PyPI, resolving conflicts in the requirement set, etc. That way lies madness :-).
  2. pip-accel utilizes the pip install --download (in pip 7.x) and pip download (in pip 8.x, refer to the not yet released pip-8.1-upgrade branch) commands to implement both aspects of step one (unpacking downloaded distributions and downloading missing distributions).
  3. When you think about it, the command pip install --download=$LOCAL_INDEX --editable $CHECKOUT doesn't really make sense at all. Nevertheless pip attempts to do something useful, which in this case means to copy $CHECKOUT to a temporary directory (this is part of what's so slow, it's copying your SQL files and my .tox directory trees) and pack that into a ZIP archive.
  4. Once pip-accel receives the requirement set with unpacked distribution archives from pip, it realizes that one of the requirements concerns an editable installation and so it side steps the binary distribution caching mechanism for that requirement and uses pip to perform an editable install of that distribution.

At step four pip has already cached a source distribution archive for the project that was to be installed in editable mode so even though pip-accel doesn't use that cached source distribution, the time to create it was still wasted :-(.


Brainstorming about possible solutions

I'm tempted to document my brainstorming about how to improve this situation :-). The command line arguments are passed to pip by pip-accel which means pip-accel can know whether an editable installation was requested before that request is passed to pip. If the editable installation comes from a local directory (e.g. an existing checkout) then I could modify pip-accel to do something like this:

This approach would work when the $ARGUMENT in pip-accel install --editable $ARGUMENT refers to a local directory, but of course none of this is going to work if $ARGUMENT refers to e.g. a URL on GitHub. What's worse is that requirement files can also contain lines of the form --editable ... and pip-accel will never see these until pip has already processed the request to "download" these editable requirements and it's too late to avoid the "bad interactions". This is what happens every time I try to come up with workarounds for this: I undermine my conceptual workarounds by finding new corner cases and realizing I might never be able to make this Just Work (TM) for users of pip-accel...