use zipfile instead of wheel command to extract build requirements

shubhbapna commented 2 days ago

Currently the cache wheel feature in bootstrap uses the wheel command to unpack the wheel and extract the build requirements. Using the wheel command requires spawning a child process which is expensive compared to using the zipfile library. There is almost a 90% difference when running this script:

import pathlib
import shutil
import time
import zipfile

import dependencies
import external_commands
import wheels

whl = "e2e-output/wheels-repo/downloads/setuptools-75.6.0-0-py3-none-any.whl"
unpack_dir = pathlib.Path("unpack_dir")
unpack_dir.mkdir()

dist_name, dist_version, build_tag, _ = wheels.extract_info_from_wheel_file(
    None, pathlib.Path(whl)
)
dist_filename = f"{dist_name}-{dist_version}"
metadata_dir = pathlib.Path(f"{dist_filename}.dist-info")

start = time.perf_counter()
external_commands.run(
    ["wheel", "unpack", whl, "--dest", str(unpack_dir)],
)
shutil.copy(
    unpack_dir
    / dist_filename
    / metadata_dir
    / f"{wheels.FROMAGER_BUILD_REQ_PREFIX}-{dependencies.BUILD_SYSTEM_REQ_FILE_NAME}",
    unpack_dir / dependencies.BUILD_SYSTEM_REQ_FILE_NAME,
)
shutil.copy(
    unpack_dir
    / dist_filename
    / metadata_dir
    / f"{wheels.FROMAGER_BUILD_REQ_PREFIX}-{dependencies.BUILD_BACKEND_REQ_FILE_NAME}",
    unpack_dir / dependencies.BUILD_BACKEND_REQ_FILE_NAME,
)
shutil.copy(
    unpack_dir
    / dist_filename
    / metadata_dir
    / f"{wheels.FROMAGER_BUILD_REQ_PREFIX}-{dependencies.BUILD_SDIST_REQ_FILE_NAME}",
    unpack_dir / dependencies.BUILD_SDIST_REQ_FILE_NAME,
)
end = time.perf_counter()
external = end - start

shutil.rmtree(str(unpack_dir))
unpack_dir.mkdir()

start = time.perf_counter()
archive = zipfile.ZipFile(whl)
archive.extract(
    str(
        metadata_dir
        / f"{wheels.FROMAGER_BUILD_REQ_PREFIX}-{dependencies.BUILD_SYSTEM_REQ_FILE_NAME}"
    ),
    unpack_dir,
)
archive.extract(
    str(
        metadata_dir
        / f"{wheels.FROMAGER_BUILD_REQ_PREFIX}-{dependencies.BUILD_SDIST_REQ_FILE_NAME}"
    ),
    unpack_dir,
)
archive.extract(
    str(
        metadata_dir
        / f"{wheels.FROMAGER_BUILD_REQ_PREFIX}-{dependencies.BUILD_BACKEND_REQ_FILE_NAME}"
    ),
    unpack_dir,
)
end = time.perf_counter()
zip = end - start

print(f"external command: {external}")
print(f"zipfile: {zip}")
print(f"zipfile is faster by: {(external - zip) / external * 100}")

shubhbapna commented 2 days ago

It might look like a micro-optimization but we run this command for almost every dependency, so the gains add up

tiran commented 1 day ago

wheel unpack does more than just unzipping the wheel. It also validates the integrity of a wheel by comparing the files against the recorded checksums and it corrects file permissions.

If you are worried about process spawn overhead, then it would be better to reimplement unpacking and packing on top of wheel.wheelfile.WheelFile. It has a stable API.

shubhbapna commented 23 hours ago

To check the integrity of the wheel file, lets add it to the _download_wheel_check which currently just checks whether the wheel file is a valid zip file. This handles the case for when we are downloading a wheel from a cache server. I will open a separate ticket for that. #512

For wheel unpack command that we run right after building a wheel to add extra metadata, it might be safe to assume that the wheel is valid since fromager just built it. For packing we are still using wheel pack command

tiran commented 14 hours ago

That makes sense, thanks!

This leaves the issue with extractall not setting file permissions like executable bit. That's a simple fix.

shubhbapna commented 5 hours ago

This leaves the issue with extractall not setting file permissions like executable bit. That's a simple fix.

Ahh I see. This would be the fix right: https://github.com/pypa/wheel/blob/main/src/wheel/cli/unpack.py#L21 opened #513

python-wheel-build / fromager

use zipfile instead of wheel command to extract build requirements #507