Closed kkchau closed 1 week ago
@kkchau the PIP_CONSTRAINT
env var alternative you described is the only answer here that solves the problem as you present it. You would have to get that passed to the Pex process. If you wanted something cleaner, you'd have to file a feature request against Pex to add a knob to re-write [build-system] requires
for 3rdparty projects. This would be a stretch though - there are 0 projects in existence that I'm aware of that allow this sort of monkeying. The only solution in general is to actually fix the project with the broken [build-system] requires
directly - which is almost always the right thing to do anyhow.
Towards that end, your thinking is also too narrow for the OP case. Instead you can self-serve by either adding a direct dependency on pomegranate==0.14.9
via a VCS requirement on a fork you make of that project at that version with an edit to its pyproject.toml
to restrain the [build-system] requires
. Alternatively, since you already use a VCS requirement for cnvkit
, you can edit its pomegranate==0.14.9
requirement to be the VCS requirement that points to your corrected fork.
Thanks @jsirois. We wanted to see if we could get around needing to fork the repo (we've already suggested the build system fix to the original author and are waiting to hear back); but in the absence of that it sounds like forking is our best option here.
Relatedly, I'm noticing that specifying a VCS requirement actually forces everything from build-system.requires
to also build from source rather than installing from wheel. Having a dependency on a commit of pomegranate with a pinned numpy<2.0.0
seems to work, but then it forces a build of scipy
from source instead of installing from available wheels. This becomes a problem in our CI runners since the build is extremely long and resource intensive. The command that gets run is along the lines of
/home/color/.cache/pants/named_caches/pex_root/venvs/0deb89b23adbbad7a5b9aac06d2362acf7572cfa/69580c2ecb438dca4d6314dbfd06532da1759f50/bin/python -sE /home/color/.cache/pants/named_caches/pex_root/venvs/0deb89b23adbbad7a5b9aac06d2362acf7572cfa/69580c2ecb438dca4d6314dbfd06532da1759f50/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -v --cache-dir /home/color/.cache/pants/named_caches/pex_root/pip/24.0/pip_cache --log /tmp/pants-sandbox-GjKSIU/.tmp/pex-pip-log.rkokrbf2/pip.log download --dest /home/color/.cache/pants/named_caches/pex_root/downloads/resolver_download.48bdusml/opt.python.3.10.14.bin.python3.10 --no-binary :all: --no-deps "pomegranate @ git+https://github.com/jmschrei/pomegranate.git@e33666ae80ad82d198830654c98dc7ac2bee2222" --index-url https://pypi.org/simple/ --retries 5 --timeout 15
where --no-binary :all:
seems to be the culprit (removing this allows wheels to be installed). Is there a way to allow binary build dependencies when installing from VCS?
Edit: we're running PEX v2.3.1, and this looks like it might be resolved by https://github.com/pex-tool/pex/pull/2433/files#diff-aad7222a03ff8182eb9c406092214ea24c7811b23dbdb9f47bfedc85aab327d0, but the hashes seem pretty unstable when testing v2.4.1, so I'm unable to verify
Expected sha256 hash of e6d09e516ac365a4a63295ba60b2b48e4300ba1fa7a2cb49411842a46a9eb3ed when downloading pomegranate but hashed to 786e9855cef3c40824df7d17eeedbb22e31ff06f913fd8c150259664b6b83ea8.
@kkchau I think the --no-binary :all:
issue was fixed (for other reasons) in Pex 2.4.1 here just above @huonw's review comment: https://github.com/pex-tool/pex/pull/2433#discussion_r1649586357
You can upgrade the Pex Pants uses by adding these to your Pants config:
You'll need to download the Pex PEX (named pex
) from the releases page and manually calculate its hash and size to fill all that in.
Aha, @jsirois thank you for confirming. I've been trying to test this but with v2.4.1 the hash seems pretty unstable; trying to package the pex_binary results in
Expected sha256 hash of e6d09e516ac365a4a63295ba60b2b48e4300ba1fa7a2cb49411842a46a9eb3ed when downloading pomegranate but hashed to {some_new_hash_each_iteration}
Based on https://github.com/pex-tool/pex/pull/2423, it seems like --format pip-no-hashes
might work, but I'm not sure how to pass this through from pants package
; I've tried adding --format=pip-no-hashes
to pex_binary.extra_build_args
, but this doesn't seem like a valid option for pex.
Interestingly, just running pex3 lock export
using the pants-generated lockfil from pants generate-lockfiles
returns the expected hash.
@kkchau can you explain in more detail what you're doing? It sounds like multiple things IIUC.
Is it true upgrading to Pex 2.4.1 solved step 1 full stop?
Did I guess step 2 correctly? If so that will never work and I can explain more. If not, please fill in the full scenario (commands you run). I am not a Pants guy but can help with Pex.
Oh wait, pants package ...
which translates to pex --lock ... -o my.pex
. Just a sec...
Ok, yeah, please provide your new exact input requirements or just attach the lock file, which would be better.
@jsirois Sure! I've attached the generated lockfile. lock.txt
For posterity, here's the inputs as well:
# pants.toml
[GLOBAL]
pants_version = "2.21.0a0"
backend_packages = [
"pants.backend.python",
"plugins.lockfile.pex",
]
[pex-cli]
version = "v2.4.1"
known_versions = [
"v2.4.1|linux_arm64|b0d823704f872f53332dd15bc247d92251da62cb0727d2e6f859780dfe1742ac|4150485",
"v2.4.1|linux_x86_64|b0d823704f872f53332dd15bc247d92251da62cb0727d2e6f859780dfe1742ac|4150485",
"v2.4.1|macos_arm64|b0d823704f872f53332dd15bc247d92251da62cb0727d2e6f859780dfe1742ac|4150485",
"v2.4.1|macos_x86_64|b0d823704f872f53332dd15bc247d92251da62cb0727d2e6f859780dfe1742ac|4150485",
]
[source]
root_patterns = ["/3rdparty"]
[python]
interpreter_constraints = ["==3.10.*"]
enable_resolves = true
resolves_generate_lockfiles = true
pip_version = "24.0"
[python.resolves]
cnvkit = "3rdparty/lockfiles/cnvkit.lockfile"
[python.resolves_to_interpreter_constraints]
cnvkit = ["==3.10.*"]
# 3rdparty/cnvkit/BUILD.pants
python_requirement(
name="lib",
resolve="cnvkit",
requirements=[
"cnvkit@ git+https://github.com/etal/cnvkit.git@43584f4762277b88b8c333e9ab187d91c2e43142",
"numpy==1.24",
"pandas==1.5.1",
"pomegranate@ git+https://github.com/jmschrei/pomegranate.git@1ce52b3b295eb00a7f3aa4cfd2b7964fc00df873",
],
)
pex_binary(
name="bin",
resolve="cnvkit",
entry_point="cnvlib.cnvkit:main",
dependencies=[":lib"],
)
pants generate-lockfiles --resolve=cnvkit
pants package 3rdparty/cnvkit:bin
@kkchau, because this:
diff -ur ../1.pomegranate-0.14.10/ ../2.pomegranate-0.14.10/ | head
diff -ur ../1.pomegranate-0.14.10/pomegranate-0.14.10/pomegranate/BayesClassifier.c ../2.pomegranate-0.14.10/pomegranate-0.14.10/pomegranate/BayesClassifier.c
--- ../1.pomegranate-0.14.10/pomegranate-0.14.10/pomegranate/BayesClassifier.c 2024-06-24 22:30:41.000000000 -0700
+++ ../2.pomegranate-0.14.10/pomegranate-0.14.10/pomegranate/BayesClassifier.c 2024-06-24 22:31:04.000000000 -0700
@@ -1007,7 +1007,7 @@
"pomegranate/base.pxd",
};
-/* "../../build-env-wvhma7j6/lib/python3.10/site-packages/numpy/__init__.pxd":688
+/* "../../build-env-ozw9jbz8/lib/python3.10/site-packages/numpy/__init__.pxd":688
* # in Cython to enable them only on the right systems.
To get that I did:
mkdir /tmp/test
cd /tmp/test
git clone https://github.com/kkchau/pomegranate
cd pomegranate
git reset --hard 1ce52b3b295eb00a7f3aa4cfd2b7964fc00df873
python3.10 -mvenv ../build.env
source ../build.env/bin/activate
pip install build
pyproject-build --sdist
mv dist/pomegranate-0.14.10.tar.gz ../1.pomegranate-0.14.10.tar.gz
git clean -fdx
pyproject-build --sdist
mv dist/pomegranate-0.14.10.tar.gz ../2.pomegranate-0.14.10.tar.gz
mkdir ../1.pomegranate-0.14.10 ../2.pomegranate-0.14.10
tar -xzf ../1.pomegranate-0.14.10.tar.gz -C ../1.pomegranate-0.14.10
tar -xzf ../2.pomegranate-0.14.10.tar.gz -C ../2.pomegranate-0.14.10
diff -ur ../1.pomegranate-0.14.10/ ../2.pomegranate-0.14.10/ | head
(I assume "pomegranate@ git+https://github.com/jmschrei/pomegranate.git@1ce52b3b295eb00a7f3aa4cfd2b7964fc00df873" was a typo and used https://github.com/kkchau/pomegranate since 1ce52b3b295eb00a7f3aa4cfd2b7964fc00df873 is not a commit in the https://github.com/jmschrei/pomegranate repo)
In other words, pomegranate uses cythonize and it generates .c
files with unstable comments. Iam unfamiliar with Cython but found no way after a cursory search to turn these off (I basically wound up here: https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#compiler-directives).
For transparency, Pex's strategy for hashing VCS requirements and local project requirements is to:
The Cython-generated unstable comments defeat the file content hash stability in step 3.
Perhaps you know more about Cython or can dig more and see if there is a way to turn off those randomized comments.
Nope, that defaults to False
and explicitly setting it to False
does not help.
This did not change anything in other words:
diff --git a/setup.py b/setup.py
index 5c8f774..da38195 100644
--- a/setup.py
+++ b/setup.py
@@ -6,6 +6,7 @@ from setuptools.command.build_ext import build_ext as _build_ext
try:
from Cython.Build import cythonize
+ from Cython.Compiler import Options
except ImportError:
use_cython = False
ext = 'c'
@@ -58,6 +59,11 @@ else:
Extension("pomegranate.distributions.*", ["pomegranate/distributions/*.pyx"])
]
+ # This ensures the generated C code is reproducible. Without it, doc comments with pointers to
+ # files in a temporary build directory with a random name are included, defeating
+ # reproducibility.
+ Options.embed_pos_in_docstring = False
+
extensions = cythonize(extensions, compiler_directives={'language_level' : "2"})
class build_ext(_build_ext):
Ok @kkchau I hand this back to you at this point. I think I've hopefully moved the understanding forward to the point of knowing the pomegranate build system is not reproducible as-is, and a reproducible build is required in order to be able to lock a VCS requirement. The options I see:
[^1]: Whatever it is it must keep the guaranty you get the exact same sdist contents when run at 2 different times. AFAICT a git hash does not guaranty this, and this case seems to prove that.
Thank you @jsirois, this is incredibly helpful. It looks like the emit_code_comments
directive needs to be turned off, since these comments are coming from numpy. Disabling that directive resolves the hashing issue! I'll close this now that everything is resolved, thank you again
Is your feature request related to a problem? Please describe. Unpinned
build-system.requires
dependencies break over time. For example, we build a PEX using pants that installscnvkit
(https://github.com/etal/cnvkit), which depends onpomegranate==0.14.9
(https://github.com/jmschrei/pomegranate/tree/v0.14.9), which in turn specifiesnumpy
in itsbuild-system.requires
.pomegranate
's build process uses a deprecated (as of numpy 2.0.0, released a few days ago) API, which breaks when pants attempts to build it with the latestnumpy
release.Describe the solution you'd like We'd like a way to pass build constraints to the underlying PEX processes so that we can pin this build dependency (among others) from pants.
Describe alternatives you've considered We've tried using the resolves_to_constraints_file directive to supply constraints to the PEX process, but that only pushes a
--constraints
flag (which in turn is pushed to the underlyingpip
processes), but this doesn't actually constraint build dependencies. As suggested in this thread, it seems like we need to specify thePIP_CONSTRAINT
environment variable to actually constraint dependencies listed inbuild-system.requires
since the constraints flag doesn't actually get passed to the build environment.Additional context Here's a minimal example:
Output when running:
^ fails due to numpy 2.0.0 even though we specify constraints. Adding
PIP_CONSTRAINT
asenv
to__run.sh
doesn't work either. What does work is rerunning the failed PEX command withPIP_CONSTRAINT
: