google / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
http://jax.readthedocs.io/
Apache License 2.0
29.09k stars 2.66k forks source link

Feature Request: a way to opt out of "hermetic python" #22216

Open SomeoneSerge opened 6 days ago

SomeoneSerge commented 6 days ago

Hi! We're encountering issues concerning https://github.com/google/jax/pull/20469 with the jax{,lib} packages in Nixpkgs. It is crucial to our project to bootstrap the python interpreter from scratch and outside bazel, rather download a prebuilt one from the internet. We've been relying on bazel selecting the system python (setting PYTHON_BIN_PATH) for this. Right now we're looking into reverting the respective diff ad hoc, but we'd prefer a tighter integration

Thanks

Please:

vam-google commented 6 days ago

Hello @SomeoneSerge,

For the projects which want to provide their own Python interpreter (a Linux distribution would be a good example of that) there is a way to doe exactly that. For the other concerns, please check my response in the corresponding nixpkgs thread.

Specifying your own Python interpreter

For a basic example of specifying your own interpreter please check Building with pre-release Python version.

Note, you do not have to follow those instructions directly if you don't want. The only thing that you actually need is to be able to specify a path to your desired (hopefully working) python interpreter in the end. The instructions just serve the purpose of obtaining such working custom interpreter, if you already have one, you can skip directly to the last step:

To use newly built Python interpreter add the following code snippet RIGHT AFTER python_init_toolchains() in your WORKSPACE file.

load("@rules_python//python:repositories.bzl", "python_register_toolchains")
python_register_toolchains(
    name = "python",
    # By default assume the interpreter is on the local file system, replace
    # with proper URL if it is not the case.
    base_url = "file://",
    ignore_root_user_error = True,
    python_version = "3.13.0a6",
    tool_versions = {
        "3.13.0a6": {
            # Path to .tar.gz with Python binary.
            "url": "/full/path/to/your/python_dev-3.13.0a6.tgz",
            "sha256": {
                # By default we assume Linux x86_64 architecture, eplace with
                # proper architecture if you were building on a different platform.
                "x86_64-unknown-linux-gnu": "cd99233ccd2df426208be3d394e1b89bbb2511caf389cfa9df7bab624a6cdc24",
            },
            "strip_prefix": "python_dev-3.13.0a6",
        },
    },
)

Mimic old non-hermetic behavior (very very not recommended)

We do not support and do not recommend this use case, but it is still possible with a little bit of work on your side.

The instructions above assume you have python packaged as a standalone .tgz archive. If you still want to just depend on whatever is installed locally on your system, you can go further but there is an important thing to know before doing so (which may affect your decision):

Even in previous non-hermetic python setup, it was wrapping system python inside bazel rules and copying parts of your system python package inside bazel's cache to be abele to use it for the rest of the build. I.e. non-hermetic python acted almost as it was still downloading a python from somewhere, it was just that "somewhere" happened to be your local environment.

With that being said, you can mimic old non-hermetic python setup by having a custom repository rule which would search your local system, package it in a form of a standalone archive to match structure of the ones we currently depend on (the structure there matches default layout you would get when build vanill Python from official sources) and then provides the packaged archive to the value for url field in the code snipped above.

Note, we do not provide such custom local-file-system-search rule ourseves and do not plan to, as it basically would re-introduce the non-hermetic python with all its issues, such as being very fragile and non-reproducible setup, but it is not very difficult to implement such on your side, especially if you do not need to make it generic (if it matches only NixOS structure, than it would be much easier to implement and maintain than something which should work on any linux system).

SomeoneSerge commented 6 days ago

Hi! Thank you @vam-google, for a prompt and comprehensive reply. I hope my comment about "random executables" in the other issue wasn't too rude, otherwise I'm prepared to apologize:)

We'll look into implementing a solution along the lines of the snippet you provided. I expect it won't be a "smooth ride".

I think it's best I avoid "explaining Nix" in this thread (maybe we reserve the Nixpkgs issue for that) and focus on our integration with Bazel. Suffice it to say that Nix and Bazel overlap in scope, which is why (1) we're not concerned about reproducibility and correctness of "system packages", and (2) why having Bazel set up its own sandbox and copy these packages into new locations has been problematic.

For now, there are two implementation details about the python we build I'm worried about:

Perhaps these notions of "correctness" (=conditions under which we're ready to "guarantee" our software will work) also clarify why I believed it would be easiest for us if we could relax Bazel's sandbox and let it see the pre-deployed toolchain