paylogic / pip-accel

pip-accel: Accelerator for pip, the Python package manager
https://pypi.python.org/pypi/pip-accel
MIT License
308 stars 35 forks source link

Implement source distribution archive caching for Amazon S3 backend #40

Open xolox opened 9 years ago

xolox commented 9 years ago

Pull request #33 introduced a second level cache in Amazon S3 which is currently only used to cache binary distribution archives.

Due to pip-accel's current architecture it can't do anything useful without unpacked source distributions. The result is that, even though your complete requirement set has been cached as binary distributions, you still need to fetch all of the source distributions from PyPI (or the local download cache) before you can install the cached binary distributions.

In the case of ephemeral local storage, the download cache may frequently be empty so pip-accel has to use pip to search around PyPI which is slow. By storing the download cache in Amazon S3 as well this process could be further optimized.

I consider this a "nice to have" feature. It's also a bit non trivial to implement as explained by me in a comment on pull request #33:

Right now only the binary cache can be stored in Amazon S3. It is possible but non trivial to add support for caching source distribution archives in Amazon S3. If @jzoldak really sees value in this I may try to implement it soon. The main difficulty is that for the binary cache a simple get() / put() interface suffices, but the source index requires an index.html that can be scanned by pip install --find-links=.... Because Amazon S3 does not support server side directory listings this will have to be implemented in pip-accel (one way or another).