pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.19k stars 613 forks source link

Allow way to pass build constraints to PEX #21097

Closed kkchau closed 1 week ago

kkchau commented 1 week ago

Is your feature request related to a problem? Please describe. Unpinned build-system.requires dependencies break over time. For example, we build a PEX using pants that installs cnvkit (https://github.com/etal/cnvkit), which depends on pomegranate==0.14.9 (https://github.com/jmschrei/pomegranate/tree/v0.14.9), which in turn specifies numpy in its build-system.requires. pomegranate's build process uses a deprecated (as of numpy 2.0.0, released a few days ago) API, which breaks when pants attempts to build it with the latest numpy release.

Describe the solution you'd like We'd like a way to pass build constraints to the underlying PEX processes so that we can pin this build dependency (among others) from pants.

Describe alternatives you've considered We've tried using the resolves_to_constraints_file directive to supply constraints to the PEX process, but that only pushes a --constraints flag (which in turn is pushed to the underlying pip processes), but this doesn't actually constraint build dependencies. As suggested in this thread, it seems like we need to specify the PIP_CONSTRAINT environment variable to actually constraint dependencies listed in build-system.requires since the constraints flag doesn't actually get passed to the build environment.

Additional context Here's a minimal example:

# pants.toml
[GLOBAL]
pants_version = "2.21.0a0"
backend_packages = ["pants.backend.python"]

[pex-cli]
version = "v2.3.1"

[source]
root_patterns = ["/3rdparty"]

[python]
interpreter_constraints = ["==3.10.*"]
enable_resolves = true
resolves_generate_lockfiles = true
pip_version = "24.0"

[python.resolves]
cnvkit = "3rdparty/lockfiles/cnvkit.lockfile"

[python.resolves_to_interpreter_constraints]
cnvkit = ["==3.10.*"]

[python.resolves_to_constraints_file]
cnvkit = "3rdparty/constraints/cnvkit.constraints.txt"
# 3rdparty/cnvkit/BUILD.pants
python_requirement(
    name="lib",
    resolve="cnvkit",
    requirements=[
        "cnvkit@ git+https://github.com/etal/cnvkit.git@bd533c33931a76b2373c4b053cdcb69c166b91b6",
        "numpy==1.24",
        "pandas==1.5.1",
        "pomegranate==0.14.9",
    ],
)

pex_binary(
    dependencies=[":lib"],
    resolve="cnvkit",
)
# 3rdparty/constraints/cnvkit.constraints.txt
numpy<2.0.0
scipy<1.12.0

Output when running:

$ pants --keep-sandboxes=on_failure generate-lockfiles --resolve=cnvkit
09:09:19.11 [INFO] Preserving local process execution dir /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u for Generate lockfile for cnvkit
09:09:19.11 [INFO] Completed: Generate lockfile for cnvkit
09:09:19.11 [ERROR] 1 Exception encountered:
Engine traceback:
  in `generate-lockfiles` goal

ProcessExecutionFailure: Process 'Generate lockfile for cnvkit' failed with exit code 1.
stdout:

stderr:
pid 89277 -> /Users/kevin.chau/.cache/pants/named_caches/pex_root/venvs/0deb89b23adbbad7a5b9aac06d2362acf7572cfa/0defb882924c4c7c56b6a0bfc3dce4b0af154aba/bin/python -sE /Users/kevin.chau/.cache/pants/named_caches/pex_root/venvs/0deb89b23adbbad7a5b9aac06d2362acf7572cfa/0defb882924c4c7c56b6a0bfc3dce4b0af154aba/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -q --cache-dir /Users/kevin.chau/.cache/pants/named_caches/pex_root/pip/24.0/pip_cache --log /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/.tmp/pex-pip-log.pbngi8de/pip.log download --dest /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/.tmp/tmphvs640fm/Users.kevin.chau..asdf.installs.python.3.10.14.bin.python3.10 --constraint 3rdparty/constraints/cnvkit.constraints.txt cnvkit@ git+https://github.com/etal/cnvkit.git@bd533c33931a76b2373c4b053cdcb69c166b91b6 numpy==1.24 pandas==1.5.1 pomegranate==0.14.9 --index-url https://pypi.org/simple/ --retries 5 --timeout 15 exited with 1 and STDERR:
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

# cd to sandbox and rerun command with increased verbosity for full output (add -vvvv in pants lock create command in __run.sh)
$ cd /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/
$ bash __run.sh
...
pid 97169 -> /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/.cache/pex_root/venvs/0deb89b23adbbad7a5b9aac06d2362acf7572cfa/0defb882924c4c7c56b6a0bfc3dce4b0af154aba/bin/python -sE /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MW
j0u/.cache/pex_root/venvs/0deb89b23adbbad7a5b9aac06d2362acf7572cfa/0defb882924c4c7c56b6a0bfc3dce4b0af154aba/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -v --cache-dir /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pant
s-sandbox-3MWj0u/.cache/pex_root/pip/24.0/pip_cache --log /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/.tmp/pex-pip-log.ak2fam3j/pip.log download --dest /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/.tmp/tmpiug08bb0/Use
rs.kevin.chau..asdf.installs.python.3.10.14.bin.python3.10 --constraint 3rdparty/constraints/cnvkit.constraints.txt cnvkit@ git+https://github.com/etal/cnvkit.git@bd533c33931a76b2373c4b053cdcb69c166b91b6 numpy==1.24 pandas==1.5.1 pomegranate==0.14.9 --index-url https://pypi.org/simpl
e/ --retries 5 --timeout 15 exited with 1 and STDERR:
...
  Collecting numpy
    Obtaining dependency information for numpy from https://files.pythonhosted.org/packages/3b/61/e1e77694c4ed929c8edebde7d2ac30dbf3ed452c1988b633569d3d7ff271/numpy-2.0.0-cp310-cp310-macosx_14_0_arm64.whl.metadata
    Using cached numpy-2.0.0-cp310-cp310-macosx_14_0_arm64.whl.metadata (60 kB)
    Link requires a different Python (3.10.14 not in: '>=3.7,<3.10'): https://files.pythonhosted.org/packages/99/f1/c00d6be56e1a718a3068079e3ec8ce044d7179345280f6a3f5066068af0d/scipy-1.6.2.tar.gz (from https://pypi.org/simple/scipy/) (requires-python:>=3.7,<3.10)
    Link requires a different Python (3.10.14 not in: '>=3.7,<3.10'): https://files.pythonhosted.org/packages/fe/fd/8704c7b7b34cdac850485e638346025ca57c5a859934b9aa1be5399b33b7/scipy-1.6.3.tar.gz (from https://pypi.org/simple/scipy/) (requires-python:>=3.7,<3.10)
    Link requires a different Python (3.10.14 not in: '>=3.7,<3.10'): https://files.pythonhosted.org/packages/bb/bb/944f559d554df6c9adf037aa9fc982a9706ee0e96c0d5beac701cb158900/scipy-1.7.0.tar.gz (from https://pypi.org/simple/scipy/) (requires-python:>=3.7,<3.10)
    Link requires a different Python (3.10.14 not in: '>=3.7,<3.10'): https://files.pythonhosted.org/packages/47/33/a24aec22b7be7fdb10ec117a95e1e4099890d8bbc6646902f443fc7719d1/scipy-1.7.1.tar.gz (from https://pypi.org/simple/scipy/) (requires-python:>=3.7,<3.10)
...
  Error compiling Cython file:
  ------------------------------------------------------------
  ...

  cpdef disable_gpu():
        global GPU
        GPU = False

  cdef ndarray_wrap_cpointer(void* data, numpy.npy_intp n):
      ^
  ------------------------------------------------------------

^ fails due to numpy 2.0.0 even though we specify constraints. Adding PIP_CONSTRAINT as env to __run.sh doesn't work either. What does work is rerunning the failed PEX command with PIP_CONSTRAINT:

$ PIP_CONSTRAINT=3rdparty/constraints/cnvkit.constraints.txt /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/.cache/pex_root/venvs/0deb89b23adbbad7a5b9aac06d2362acf7572cfa/0defb882924c4c7c56b6a0bfc3dce4b0af154aba/bin/python -sE /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/.cache/pex_root/venvs/0deb89b23adbbad7a5b9aac06d2362acf7572cfa/0defb882924c4c7c56b6a0bfc3dce4b0af154aba/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -v --cache-dir /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/.cache/pex_root/pip/24.0/pip_cache --log /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/.tmp/pex-pip-log.ak2fam3j/pip.log download --dest /private/var/folders/n_/c_8hhzf96f3d6h2nh529ybt40000gp/T/pants-sandbox-3MWj0u/.tmp/tmpiug08bb0/Users.kevin.chau..asdf.installs.python.3.10.14.bin.python3.10 --constraint 3rdparty/constraints/cnvkit.constraints.txt "cnvkit@ git+https://github.com/etal/cnvkit.git@bd533c33931a76b2373c4b053cdcb69c166b91b6" numpy==1.24 pandas==1.5.1 pomegranate==0.14.9 --index-url https://pypi.org/simple/ --retries 5 --timeout 15
...
  Using cached numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl (14.0 MB)
...
Successfully downloaded numpy pandas pomegranate cnvkit biopython cython joblib matplotlib networkx pyfaidx pysam python-dateutil pytz reportlab scikit-learn scipy pyyaml contourpy cycler fonttools kiwisolver packaging pillow pyparsing six threadpoolctl chardet importlib-metadata setuptools zipp
jsirois commented 1 week ago

@kkchau the PIP_CONSTRAINT env var alternative you described is the only answer here that solves the problem as you present it. You would have to get that passed to the Pex process. If you wanted something cleaner, you'd have to file a feature request against Pex to add a knob to re-write [build-system] requires for 3rdparty projects. This would be a stretch though - there are 0 projects in existence that I'm aware of that allow this sort of monkeying. The only solution in general is to actually fix the project with the broken [build-system] requires directly - which is almost always the right thing to do anyhow.

Towards that end, your thinking is also too narrow for the OP case. Instead you can self-serve by either adding a direct dependency on pomegranate==0.14.9 via a VCS requirement on a fork you make of that project at that version with an edit to its pyproject.toml to restrain the [build-system] requires. Alternatively, since you already use a VCS requirement for cnvkit, you can edit its pomegranate==0.14.9 requirement to be the VCS requirement that points to your corrected fork.

kkchau commented 1 week ago

Thanks @jsirois. We wanted to see if we could get around needing to fork the repo (we've already suggested the build system fix to the original author and are waiting to hear back); but in the absence of that it sounds like forking is our best option here.

Relatedly, I'm noticing that specifying a VCS requirement actually forces everything from build-system.requires to also build from source rather than installing from wheel. Having a dependency on a commit of pomegranate with a pinned numpy<2.0.0 seems to work, but then it forces a build of scipy from source instead of installing from available wheels. This becomes a problem in our CI runners since the build is extremely long and resource intensive. The command that gets run is along the lines of

/home/color/.cache/pants/named_caches/pex_root/venvs/0deb89b23adbbad7a5b9aac06d2362acf7572cfa/69580c2ecb438dca4d6314dbfd06532da1759f50/bin/python -sE /home/color/.cache/pants/named_caches/pex_root/venvs/0deb89b23adbbad7a5b9aac06d2362acf7572cfa/69580c2ecb438dca4d6314dbfd06532da1759f50/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -v --cache-dir /home/color/.cache/pants/named_caches/pex_root/pip/24.0/pip_cache --log /tmp/pants-sandbox-GjKSIU/.tmp/pex-pip-log.rkokrbf2/pip.log download --dest /home/color/.cache/pants/named_caches/pex_root/downloads/resolver_download.48bdusml/opt.python.3.10.14.bin.python3.10 --no-binary :all: --no-deps "pomegranate @ git+https://github.com/jmschrei/pomegranate.git@e33666ae80ad82d198830654c98dc7ac2bee2222" --index-url https://pypi.org/simple/ --retries 5 --timeout 15

where --no-binary :all: seems to be the culprit (removing this allows wheels to be installed). Is there a way to allow binary build dependencies when installing from VCS?

Edit: we're running PEX v2.3.1, and this looks like it might be resolved by https://github.com/pex-tool/pex/pull/2433/files#diff-aad7222a03ff8182eb9c406092214ea24c7811b23dbdb9f47bfedc85aab327d0, but the hashes seem pretty unstable when testing v2.4.1, so I'm unable to verify

Expected sha256 hash of e6d09e516ac365a4a63295ba60b2b48e4300ba1fa7a2cb49411842a46a9eb3ed when downloading pomegranate but hashed to 786e9855cef3c40824df7d17eeedbb22e31ff06f913fd8c150259664b6b83ea8.
jsirois commented 1 week ago

@kkchau I think the --no-binary :all: issue was fixed (for other reasons) in Pex 2.4.1 here just above @huonw's review comment: https://github.com/pex-tool/pex/pull/2433#discussion_r1649586357

You can upgrade the Pex Pants uses by adding these to your Pants config:

You'll need to download the Pex PEX (named pex) from the releases page and manually calculate its hash and size to fill all that in.

kkchau commented 1 week ago

Aha, @jsirois thank you for confirming. I've been trying to test this but with v2.4.1 the hash seems pretty unstable; trying to package the pex_binary results in

Expected sha256 hash of e6d09e516ac365a4a63295ba60b2b48e4300ba1fa7a2cb49411842a46a9eb3ed when downloading pomegranate but hashed to {some_new_hash_each_iteration}

Based on https://github.com/pex-tool/pex/pull/2423, it seems like --format pip-no-hashes might work, but I'm not sure how to pass this through from pants package; I've tried adding --format=pip-no-hashes to pex_binary.extra_build_args, but this doesn't seem like a valid option for pex.

Interestingly, just running pex3 lock export using the pants-generated lockfil from pants generate-lockfiles returns the expected hash.

jsirois commented 1 week ago

@kkchau can you explain in more detail what you're doing? It sounds like multiple things IIUC.

  1. Generate a lock file.
  2. Export the lock file to a hashed Pip requirements file format?

Is it true upgrading to Pex 2.4.1 solved step 1 full stop?

Did I guess step 2 correctly? If so that will never work and I can explain more. If not, please fill in the full scenario (commands you run). I am not a Pants guy but can help with Pex.

jsirois commented 1 week ago

Oh wait, pants package ... which translates to pex --lock ... -o my.pex. Just a sec...

jsirois commented 1 week ago

Ok, yeah, please provide your new exact input requirements or just attach the lock file, which would be better.

kkchau commented 1 week ago

@jsirois Sure! I've attached the generated lockfile. lock.txt


For posterity, here's the inputs as well:

# pants.toml
[GLOBAL]
pants_version = "2.21.0a0"

backend_packages = [
    "pants.backend.python",
    "plugins.lockfile.pex",
]
[pex-cli]
version = "v2.4.1"
known_versions = [
    "v2.4.1|linux_arm64|b0d823704f872f53332dd15bc247d92251da62cb0727d2e6f859780dfe1742ac|4150485",
    "v2.4.1|linux_x86_64|b0d823704f872f53332dd15bc247d92251da62cb0727d2e6f859780dfe1742ac|4150485",
    "v2.4.1|macos_arm64|b0d823704f872f53332dd15bc247d92251da62cb0727d2e6f859780dfe1742ac|4150485",
    "v2.4.1|macos_x86_64|b0d823704f872f53332dd15bc247d92251da62cb0727d2e6f859780dfe1742ac|4150485",
]

[source]
root_patterns = ["/3rdparty"]

[python]
interpreter_constraints = ["==3.10.*"]
enable_resolves = true
resolves_generate_lockfiles = true
pip_version = "24.0"

[python.resolves]
cnvkit = "3rdparty/lockfiles/cnvkit.lockfile"

[python.resolves_to_interpreter_constraints]
cnvkit = ["==3.10.*"]
# 3rdparty/cnvkit/BUILD.pants
python_requirement(
    name="lib",
    resolve="cnvkit",
    requirements=[
        "cnvkit@ git+https://github.com/etal/cnvkit.git@43584f4762277b88b8c333e9ab187d91c2e43142",
        "numpy==1.24",
        "pandas==1.5.1",
        "pomegranate@ git+https://github.com/jmschrei/pomegranate.git@1ce52b3b295eb00a7f3aa4cfd2b7964fc00df873",
    ],
)

pex_binary(
    name="bin",
    resolve="cnvkit",
    entry_point="cnvlib.cnvkit:main",
    dependencies=[":lib"],
)
pants generate-lockfiles --resolve=cnvkit
pants package 3rdparty/cnvkit:bin
jsirois commented 1 week ago

@kkchau, because this:

diff -ur ../1.pomegranate-0.14.10/ ../2.pomegranate-0.14.10/ | head
diff -ur ../1.pomegranate-0.14.10/pomegranate-0.14.10/pomegranate/BayesClassifier.c ../2.pomegranate-0.14.10/pomegranate-0.14.10/pomegranate/BayesClassifier.c
--- ../1.pomegranate-0.14.10/pomegranate-0.14.10/pomegranate/BayesClassifier.c  2024-06-24 22:30:41.000000000 -0700
+++ ../2.pomegranate-0.14.10/pomegranate-0.14.10/pomegranate/BayesClassifier.c  2024-06-24 22:31:04.000000000 -0700
@@ -1007,7 +1007,7 @@
   "pomegranate/base.pxd",
 };

-/* "../../build-env-wvhma7j6/lib/python3.10/site-packages/numpy/__init__.pxd":688
+/* "../../build-env-ozw9jbz8/lib/python3.10/site-packages/numpy/__init__.pxd":688
  * # in Cython to enable them only on the right systems.

To get that I did:

mkdir /tmp/test
cd /tmp/test
git clone https://github.com/kkchau/pomegranate
cd pomegranate
git reset --hard 1ce52b3b295eb00a7f3aa4cfd2b7964fc00df873
python3.10 -mvenv ../build.env
source ../build.env/bin/activate
pip install build
pyproject-build --sdist
mv dist/pomegranate-0.14.10.tar.gz ../1.pomegranate-0.14.10.tar.gz
git clean -fdx
pyproject-build --sdist
mv dist/pomegranate-0.14.10.tar.gz ../2.pomegranate-0.14.10.tar.gz
mkdir ../1.pomegranate-0.14.10 ../2.pomegranate-0.14.10
tar -xzf ../1.pomegranate-0.14.10.tar.gz -C ../1.pomegranate-0.14.10
tar -xzf ../2.pomegranate-0.14.10.tar.gz -C ../2.pomegranate-0.14.10
diff -ur ../1.pomegranate-0.14.10/ ../2.pomegranate-0.14.10/ | head

(I assume "pomegranate@ git+https://github.com/jmschrei/pomegranate.git@1ce52b3b295eb00a7f3aa4cfd2b7964fc00df873" was a typo and used https://github.com/kkchau/pomegranate since 1ce52b3b295eb00a7f3aa4cfd2b7964fc00df873 is not a commit in the https://github.com/jmschrei/pomegranate repo)

In other words, pomegranate uses cythonize and it generates .c files with unstable comments. Iam unfamiliar with Cython but found no way after a cursory search to turn these off (I basically wound up here: https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#compiler-directives).

For transparency, Pex's strategy for hashing VCS requirements and local project requirements is to:

  1. run the project build to build an sdist.
  2. unpack the sdist in a new directory.
  3. perform a stable, sorted directory tree walk that hashes all file paths relative to the directory created in 2 and all file content.

The Cython-generated unstable comments defeat the file content hash stability in step 3.

Perhaps you know more about Cython or can dig more and see if there is a way to turn off those randomized comments.

jsirois commented 1 week ago

Oh, wait. Here I think: https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#Cython.Compiler.Options.embed_pos_in_docstring

jsirois commented 1 week ago

Nope, that defaults to False and explicitly setting it to False does not help. This did not change anything in other words:

diff --git a/setup.py b/setup.py
index 5c8f774..da38195 100644
--- a/setup.py
+++ b/setup.py
@@ -6,6 +6,7 @@ from setuptools.command.build_ext import build_ext as _build_ext

 try:
     from Cython.Build import cythonize
+    from Cython.Compiler import Options
 except ImportError:
     use_cython = False
     ext = 'c'
@@ -58,6 +59,11 @@ else:
                Extension("pomegranate.distributions.*", ["pomegranate/distributions/*.pyx"])
     ]

+    # This ensures the generated C code is reproducible. Without it, doc comments with pointers to
+    # files in a temporary build directory with a random name are included, defeating
+    # reproducibility.
+    Options.embed_pos_in_docstring = False
+
     extensions = cythonize(extensions, compiler_directives={'language_level' : "2"})

 class build_ext(_build_ext):
jsirois commented 1 week ago

Ok @kkchau I hand this back to you at this point. I think I've hopefully moved the understanding forward to the point of knowing the pomegranate build system is not reproducible as-is, and a reproducible build is required in order to be able to lock a VCS requirement. The options I see:

  1. Fix the pomegranate build to generate reproducible sdist contents.
  2. Pre-build a wheel, host it in a --find-links repo, configure Pants to see that (in addition to PyPI) and lock against your custom wheel with an entry for pomegranate in https://www.pantsbuild.org/2.20/reference/subsystems/python#resolves_to_only_binary
  3. ... propose some clever way I could not think of to hash VCS requirements [^1]

[^1]: Whatever it is it must keep the guaranty you get the exact same sdist contents when run at 2 different times. AFAICT a git hash does not guaranty this, and this case seems to prove that.

kkchau commented 1 week ago

Thank you @jsirois, this is incredibly helpful. It looks like the emit_code_comments directive needs to be turned off, since these comments are coming from numpy. Disabling that directive resolves the hashing issue! I'll close this now that everything is resolved, thank you again