pypa / installer

A low-level library for installing from a Python wheel distribution.
https://installer.readthedocs.io/
MIT License
123 stars 51 forks source link

fix: validate wheel files in a RAM friendly way #183

Closed ralbertazzi closed 1 year ago

ralbertazzi commented 1 year ago

See https://github.com/python-poetry/poetry/issues/7983

Content validation of a wheel record currently loads the entire file in memory with a self._zipfile.read(item). This is extremely inefficient from big wheels (the well known PyTorch has now >2 GB wheel files) and leads to an extremely high RAM consumption. This PR fixes this behaviour by reading the zip file content in a buffered way, as other parts of the codebase are already doing. Unfortunately this required a small change to some signatures.

pradyunsg commented 1 year ago

I've filed #185 for this to have an issue associated with the PR; in case there's any high-level details to discuss. There likely aren't but it can't hurt to have an issue to close and drive not-PR-specific discussions into.

Other than that, I don't think making a backwards incompatible change is necessary here -- I've filed #186 that does not contain backwards incompatible API changes and instead adds a new method and deprecates the (problematic) RecordEntry.validate(data) method.