scele / rules_python

Experimental Bazel Python Rules
Apache License 2.0
2 stars 8 forks source link

Refactor: Simplify overriding wheel attributes, make wheel build deterministic #12

Closed scele closed 5 years ago

scele commented 5 years ago

This is a pretty big refactoring. I have tried to separate the changes to logical commits for ease of review, but they do not necessary work correctly in isolation.


Refactor generated requirements.bzl layout and make overriding attributes easier

Previously, we would generate a bunch of macros and lookup tables to the
checked-in requirements.bzl that is generated by piptool.py during
@pip_deps//:update.  To clean things up (and to make it easier to make
changes to these macros), generate a simple "wheels" dict instead, which
contains all nexessary information about the resolved wheels.  Move the
macros to requirements.bzl.tpl instead, which is generated at runtime
and not checked into the user's repository.

Remove dirty package targets, since we don't need them anymore.  Rename
"extract_wheels" to "extract_wheel" and only accept one --whl parameter
in "piptool resolve", since the only use case for extracting multiple
wheels at once was the dirty wheel system.

Previously, we had to expose a new pip_import attribute for every
per-wheel customization that we whish to do: additional_runtime_deps,
additional_buildtime_env, patch_runtime, ...  This is a bit tedious to
maintain.  Instead, add a new "requirements_overrides" attribute, which
accepts a dictionary in the same format as the checked-in "wheels"
dictionary, and is merged together with that at analysis time.  This
allows users to override any attribute of any wheel, and as we add new
attributes to "download_or_build_wheel" and "extract_wheel" rules, they
are automatically overridable by the new mechanism.

Remove unnecessary templating from update.sh, and just generate the
update command directly from the repository rule.

Support extras and "target" argument in pip_version_proxy's requirement macro

Generate new select() targets for extras, and also allow user to pass additional
targets by adding "additional_targets" in the "requirements_overrides" dict.
This should remove the need for "requirement_repo" macro, since the pip_version_proxy
would now correctly proxy through things like:

   requirement("tensorflow", target = "cpp")

Since these new targets that we use in select() are not necessary py_libraries,
use native.alias instead of py_library targets.

"binary" attribute is still left unsupported by the version proxy.

Use standard patching attributes

Rename "patch_runtime" to "patches", and add other standard patching
attributes.

Move build_wheel logic from skylark into piptool

We will need to trigger the "build single wheel in isolation" logic
also from "piptool resolve" when generating stable sha256s later.

Make wheel build deterministic

Use a fixed --build dir when invoking "pip wheel", to make debug symbols
stable.

Pass various defines like -D__DATE__=redacted to the wheel build to avoid
getting timestamps into the binaries.

If the wheel is built locally, strip timestamps from the whl zip file.

Upgrade wheel package used by piptool to a later version (old one would
produce non-deterministic wheels on python3).

Support "sha256" attribute in the "download_or_build_wheel" rule, and use
it in repository_ctx.download (when downloading) or pass to "piptool build",
where we will check that the built wheel digest matches the expected one.
(Note that "piptool resolve" does not yet emit the "sha256" attribute to
the checked-in requirements.bzl, that will be done in a follow-up change).

The build can still be non-deterministic when executed on different
machines, since it's not hermetic.  But when built inside a container,
it should be.

Support generating stable sha256s from "piptool resolve"

Add a new "digests" boolean attribute to pip_import.  If true, compute
sha256 digests from all wheels after resolving.  This is done as follows:

  - If the wheel that gets bulit during "piptool resolve" has the same sha256
    that is already stored in the current version of the to-be-regenerated
    requirements.bzl, then use that as the sha256.  This is an optimization
    that speeds up sha256 resolution in the incremental resolve case where the
    wheel digest is the same whether it's built in isolation or as part of a
    non-isolated resolve operation.

  - Else, build the wheel in isolation (in a stable path to get deterministic
    debug symbols, using the correct additional_buildtime_env, using a clean
    PYTHONPATH that has just additional_buildtime_deps and their transitive
    runtime deps installed, etc.).  If the wheel build ends up just downloading
    a prebuilt wheel, use the sha256 of that without further stripping.

  - Else, if the wheel gets built locally, strip timestamps from the built
    zip file.  If the resulting stripped wheel has the same sha256 as in the
    current version of the to-be-generated requirements.bzl, then use that as
    the sha256.  This is an optimization for the incremental resolve cas where
    the wheel digest is different when built in isolation, but is not changing
    as part of the current resolve cycle.

  - Else, we have a wheel whose sha256 changed from the one produced during
    non-isolated resolve, and it doesn't match the current sha256 stored in
    the current version of requirements.bzl.  As a last wall of defence, build
    the wheel again, and make sure that we get the same sha256 from the second
    build.  If this check fails, then the wheel build is not deterministic.
    (I have not yet come across any wheels where this would happen.)
therc commented 5 years ago

Sorry to invade a random pull request (I can't create an issue, either), but are there plans to send this repository's changes upstream? I saw Conrado's talk and chatted with him in person. I have been experimenting with several Python rule sets and assorted forks. This is the most comprehensive in handling bizarre packages. I'm not asking for a roadmap; only wondering if this is going to be a permanent fork. Thanks for the great work!

scele commented 5 years ago

Sorry to invade a random pull request

No problem at all! :)

but are there plans to send this repository's changes upstream?

I'd like to upstream these changes, but it's a lot of work since this is a pretty significant rewrite.. So no, not planning this to be a permanent fork, it's just a matter of finding the time and energy to contribute back to upstream. :) :/

Glad you're finding it useful!

mirandaconrado commented 5 years ago

LGTM but honestly I had quite a hard time following everything that was done. Maybe bc I haven't looked it in a while.