paylogic / pip-accel

pip-accel: Accelerator for pip, the Python package manager
https://pypi.python.org/pypi/pip-accel
MIT License
308 stars 35 forks source link

Binary cache doesn't work with scipy #49

Closed Suor closed 8 years ago

Suor commented 9 years ago

Steps to reproduce:

virtualenv ve
source ve/bin/active
pip install pip-accel
pip-accel install scipy

Then destroy virtualenv and repeat, scipy still compiles.

xolox commented 9 years ago

Hi Alexander and thanks for the feedback!

Short answer without details: The scipy package doesn't actually compile the second time: It is successfully cached and on the second run the cached binary package is correctly re-used. If you are hesitant to accept this explanation I created a gist that demonstrates my statement. As shown in the linked gist the first run takes 14 minutes and the second run takes 2 minutes (the 12 minute difference there is that scipy doesn't have to be compiled the second time :-). However subsequent runs are slower than they could be and I can explain why.

Explanation of cause of slow down: The scipy package depends on the numpy package and includes the numpy package in its setup.py script as a setup_requires dependency. What this comes down to is that every time the setup.py script of scipy is run, it will download and build the numpy binary (this is simplifying a bit, the next section contains more details).

General rant about setup_requires "feature" (with more details): To be very blunt the setup_requires feature of setuptools is a very tricky feature that is hard to use "correctly" and consequently most people indeed don't use it "correctly", so the effect is quite annoying and easy to demonstrate:

  1. Create an empty, clean Python virtual environment and activate it.
  2. Manually download the scipy source distribution archive (the *.tar.gz file) and unpack it.
  3. Navigate into the unpacked scipy source distribution and run an "innocent looking" command like python setup.py --version and be amazed while the answer to this very simple question takes a couple of minutes because numpy is being downloaded and compiled before the version number of the scipy source distribution is printed ಠ_ಠ
  4. The next time you run that command it will be fast because the setup_requires dependency is already satisfied (you can check with ls -ld numpy-*.egg), but if you unpack the source distribution to a clean directory and run step 3 again it will again be slow.

How this problem can be resolved: I've actually fixed this exact issue before by sending pull requests to Python projects, see e.g. pyca/cryptography#1257. So to bring this issue to some sort of conclusion: It is possible to fix this, but it requires cooperation from the package author(s) because I can't think of any way for pip-accel to fix this issue "from the outside" - the issue can be clearly demonstrated without invoking pip-accel at all :-).

Given my extensive explanation here and the additional information available in pyca/cryptography#1257, do you feel like creating a pull request for the scipy project?

I can do it as well, but 1) I have more than 30 open source projects to maintain and am already slacking on half of them because I just don't have the time and 2) I've never actually used scipy and have zero experience with it (I actually spent quite a bit of time determining its build dependencies before I was able to reproduce this issue! :-)

xolox commented 9 years ago

A small follow up:

I couldn't believe that this "issue" would have gone unfixed in such a popular / high profile Python package as scipy and it actually looks like the scipy people are aware of the issue and working on it, but between all of the issues, pull requests and commits I still don't see a full solution emerging, maybe I'm confused at this point. The most useful reference I was able to find is scipy/scipy#453. Note that the last comment in that pull request is only two months old and states that the issue hasn't been fully fixed in a released version, if I understand correctly.

Suor commented 9 years ago

Hello, and thank you for this explanation.

I, however, think you can fix it from outside, with some hustle :) Here is what you can do:

Alternatively you can keep .eggs directory for each python with all sorts of stuff and symlink it into current dir before installation.

xolox commented 8 years ago

Sorry for the long delay in replying here, other issues and projects got in the way of finishing this reply and summarizing my thoughts about your proposal to fix this from the outside.


If it turns out that there is a way to:

  1. Reliably cache setup_requires dependencies
  2. That will work for all users
  3. Without major downsides

Then I'm all for it and don't mind spending time on implementing this. However I'm pretty sure things are not quite as simple as you explain it in your last comment, and this is the reason why I never seriously tackled this issue inside pip-accel before now:


The best way for me to find out how realistic all of this is would be to (try to) implement the required changes. The issues caused by setup_requires have been a thorn in my side even before I created pip-accel and since then it hasn't gotten any better, so believe me when I say that I'd love to improve how this works :-).

However there is also issue #57 suggesting to upgrade from pip 6.x to pip 7.x and it seems wise to tackle that upgrade before I introduce yet more monkey patching (as explained above) because every additional pip monkey patch in pip-accel makes it a bit harder for me to upgrade to a new major version of pip.

Suor commented 8 years ago

Would be nice to se it finally :), but no rush.

xolox commented 8 years ago

Hi Alexander,

Sorry things took so long, however thanks for your persistence in fixing this on the side of pip-accel. I just released pip-accel 0.39 which 1) depends on setuptools >= 7.0 and 2) manages the creation of .eggs symbolic links to avoid recompilation of setup requirements.

I've now tested this with a couple of packages including the SciPy / NumPy combination and it seems to work very well! There is even an automated test to verify the functionality - getting the test to work correctly in all environments was actually a lot more work than the feature itself :-).


Given that I now have a way to manipulate unpacked source distributions before they are processed by pip I'm considering extending this logic to inject allow_hosts and find_links options that can keep setuptools (easy_install) off the internet. Not sure how that would work yet, but I'm thinking about it :-).