Closed jsirois closed 1 month ago
Reviewers - yet another big one. Thanks in advance for any time you can spare. This 1st commit has no tests, those are coming in a bit, but I wanted to get this out in case you wanted to start reading. There has been pretty extensive manual testing, both for perf (see binding command that resulted to bring perf down to --sh-boot
levels in all cases) and for feature-matrix complexity.
Looking to carve off some time this evening to review this, but before I start, would it be safe to say that this is a (strict?) subset of the equivalent functionality when using science
+ a lift.toml to create a naively packaged pex + interpreter (e.g. excluding busy box functionality and custom bindings).
Yes. You'll find a nod to this and a pointer to science docs in the --scie
help string as a consequence (i.e.: if you need to get more fancy, go there instead). Text starts here: https://github.com/pex-tool/pex/pull/2466/files#diff-bbf96d2c6fdcaa284ebb9e1fc92f6485b122a18e9ed241d96e943c5a90fbe168R62
Will try and make some time to review this in a few hours. But very cool feature!
OK, CI is now down to erroring on the Linux runners having ~/.netrc
as a directory and the Python netrc stdlib not dealing with this 🤦 . I can work around this in science - where the error is originating from - but for now I'd like to solve just Pex issues; so I'll work around in CI instead.
... and again the face-palm was mine own. This was an issue in the dtox.sh
script used on the Linux runners - now fixed.
Alright reviewers, the tests are now complete. Good for a final review.
@sureshjoshi I'm happy to break off a feature request for either or both of the --scie-manifest
and --scie-busybox
ideas that came up in our thread above, just let me know if either makes sense / are features you will use.
I'm happy to break off a feature request for either or both of the
--scie-manifest
and--scie-busybox
ideas that came up in our thread above, just let me know if either makes sense / are features you will use.
Yep, after this lands, I can play around with it a bit more and see where it leads to.
In the meantime, I want to confirm that this is the expected behaviour.
# foo.py
import uvicorn
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def root():
return {"message": "Hello World"}
if __name__ == "__main__":
uvicorn.run(app, host="localhost", port=8000)
% python3.12 -m pex fastapi uvicorn --scie eager --scie-python-version 3.11 -o foo.pex -- foo.py
% SCIE=inspect ./foo
...
"files": [
{
"name": "cpython-3.11.9+20240713-aarch64-apple-darwin-install_only.tar.gz",
"key": "cpython",
...
% ./foo
Python 3.12.4 (main, Jun 6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import fastapi
>>>
In my example, as the pex was built with python3.12, the pex shebang is /usr/bin/env python3.12
- so even though the scie is bundled with Python 3.11, we are expecting to enter a 3.12 REPL, correct?
Based on the comment in the thread above:
As such, I think it makes sense for Pex to offer the ability to take your PEX file and turn it into a scie that behaves exactly the same, with nothing extra except maybe running faster.
% python3.11 ./foo.pex
Python 3.12.4 (main, Jun 6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>
The current behaviour matches what would happen if I just ran the pex with python3.11, so everything seems to line up and I'm just confirming my understanding of the feature.
Looks good to me after the back and forth. Would still be good to get eyeballs from someone more familiar with pex itself than me.
AFAICT that is currently basically no one except me.
Looks good to me after the back and forth. Would still be good to get eyeballs from someone more familiar with pex itself than me.
AFAICT that is currently basically no one except me.
😆 Good point
I don't have time to review this but i do have a question.
If one needs to customize the binaries, they would need to use science
to create new binaries right?
The current behaviour matches what would happen if I just ran the pex with python3.11, so everything seems to line up and I'm just confirming my understanding of the feature.
Well ... you did a super weird thing too though. What do you think you meant by the trailing -- foo.py
?! Did you mean to use --exe foo.py
? Or were you just stressing buggy use cases? The -- foo.py
you used just throws away those extra args, which is probably a bug - you should be warned at least. So you just get a foo.pex (an thus a foo scie) without an entrypoint.
All that weird aside, what actually happened here is this: You build a platform specific PEX for Python 3.12, but instead of letting Pex use that to configure a 3.12 PBS, you overrode that and said 3.11 is fine - which it's not. When the boot binding runs, Pex is smart enough to test the current PBS 3.11 interpreter, find it can't load the PEX, then continue on to try other Pythons on the PATH. It finds a python3.12, which works to load the PEX and then writes out these bindings on my machine:
cat /home/jsirois/.cache/nce/5f4d759f14822688a76e0fd21f7a93897017bba9ba2218635023781d324ee362/locks/configure-bfbf6d1d4ddde46844370bf7672b02dfc07b0e8318fb2ce7b277b8436167a67b
PYTHON=/usr/bin/python3.12
PEX=/home/jsirois/.cache/nce/5f4d759f14822688a76e0fd21f7a93897017bba9ba2218635023781d324ee362/bindings/pex_root/unzipped_pexes/c27c9d03a91f03a2286d5901502f2ab7872918e5/__main__.py
So, as for --scie-platform
, the use case for --scie-pbs-release
and --scie-python-version
is generally narrowing the values that naturally arise from the PEX in question. The PEX here only supports 3.12; but you pushed the version out of bounds.
I guess I probably should blank out PATH
in the boot binding to keep things hermetic:
:; git diff pex/scie/science.py
diff --git a/pex/scie/science.py b/pex/scie/science.py
index 61f9f7ac..50935894 100644
--- a/pex/scie/science.py
+++ b/pex/scie/science.py
@@ -114,6 +114,7 @@ def create_manifests(
{
"env": {
"default": env_default,
+ "remove_exact": ["PATH"],
"remove_re": ["PEX_.*"],
"replace": {
"PEX_INTERPRETER": "1",
:; git diff pex/pex_bootstrapper.py
diff --git a/pex/pex_bootstrapper.py b/pex/pex_bootstrapper.py
index a097736f..e3609efc 100644
--- a/pex/pex_bootstrapper.py
+++ b/pex/pex_bootstrapper.py
@@ -314,7 +314,7 @@ def find_compatible_interpreter(interpreter_test=None):
path=(
os.pathsep.join(ENV.PEX_PYTHON_PATH)
if ENV.PEX_PYTHON_PATH
- else os.getenv("PATH")
+ else os.getenv("PATH", "(The PATH is empty!)")
)
)
)
Gives:
:; python3.12 -m pex fastapi uvicorn --scie eager --scie-python-version 3.11 -o foo.pex
:; ./foo
Failed to find compatible interpreter on path (The PATH is empty!).
Examined the following interpreters:
1.) /home/jsirois/.cache/nce/1f91c44febc850376a35ae77e1d45f7c823994b0c80293bbbc17e647eb893853/cpython-3.11.9+20240713-x86_64-unknown-linux-gnu-install_only.tar.gz/python/bin/python3.11 CPython==3.11.9
No interpreter compatible with the requested constraints was found:
Failed to resolve requirements from PEX environment @ /home/jsirois/.cache/nce/263d2999f5e4edddedbbcb29b3aeaf6f49d373ee26a76de93ad97d16f9959b0d/bindings/pex_root/unzipped_pexes/c27c9d03a91f03a2286d5901502f2ab7872918e5.
Needed cp311-cp311-manylinux_2_35_x86_64 compatible dependencies for:
1: pydantic-core==2.20.1
Required by:
pydantic 2.8.2
But this pex had no ProjectName(raw='pydantic-core', validated=False, normalized='pydantic-core') distributions.
2: MarkupSafe>=2.0
Required by:
Jinja2 3.1.4
But this pex had no ProjectName(raw='MarkupSafe', validated=False, normalized='markupsafe') distributions.
3: httptools>=0.5.0; extra == "standard"
Required by:
uvicorn 0.30.1
But this pex had no ProjectName(raw='httptools', validated=False, normalized='httptools') distributions.
4: pyyaml>=5.1; extra == "standard"
Required by:
uvicorn 0.30.1
But this pex had no ProjectName(raw='pyyaml', validated=False, normalized='pyyaml') distributions.
5: uvloop!=0.15.0,!=0.15.1,>=0.14.0; (sys_platform != "win32" and (sys_platform != "cygwin" and platform_python_implementation != "PyPy")) and extra == "standard"
Required by:
uvicorn 0.30.1
But this pex had no ProjectName(raw='uvloop', validated=False, normalized='uvloop') distributions.
6: watchfiles>=0.13; extra == "standard"
Required by:
uvicorn 0.30.1
But this pex had no ProjectName(raw='watchfiles', validated=False, normalized='watchfiles') distributions.
7: websockets>=10.4; extra == "standard"
Required by:
uvicorn 0.30.1
But this pex had no ProjectName(raw='websockets', validated=False, normalized='websockets') distributions.
Error: Failed to establish atomic directory /home/jsirois/.cache/nce/263d2999f5e4edddedbbcb29b3aeaf6f49d373ee26a76de93ad97d16f9959b0d/locks/configure-9150f882feea5a550e5936c10776cf44573934ed3669cd27f5bd99ec8ef75f90. Population of work directory failed: Boot binding command failed: exit status: 1
The ./foo scie contains no alternate boot commands.
What do you think @sureshjoshi? Keep it behaving just like the PEX and bouncing down the PATH to find an interpreter that works (this means we shipped the wrong Python but the target machine had the right one), or keep things hermetic and fail as my experiment above does?
FWIW, I debugged all this with 2 techniques:
rm -rf ~/.cache/nce && RUST_LOG=trace PEX_VERBOSE=1 ./foo
SCIE=split ./foo dist && _PEX_SCIE_INSTALLED_PEX_DIR=fake SCIE_BINDING_ENV=/dev/fd/0 PEX_VERBOSE=1 python3.11 dist/pex dist/configure-binding.py
If one needs to customize the binaries, they would need to use science to create new binaries right?
@zmanji in short, probably yes.
You could use science
, but you can also just use cat
plus a copy of the scie-jump
(and a copy of ptex
if you want lazy loading). See: https://github.com/a-scie/jump/blob/main/docs/packaging.md for more, but science
is just a high level tool that dogfoods itself and these low level tools to provide a native python science
binary that make assembling scies a bit easier.
As per my debug session above of @sureshjoshi's test rig case, you can also just use Pex to build your scie, then split it into its components with SCIE=split ./my-pex-scie /tmp/workbench
, then cd
to the /tmp/workbench
and edit the lift.json
, and symlink or copy any extra files you added to the manifest to the directory and then run ./scie-jump
to re-assemble the scie. It will plop out in that directory.
On being hermetic, I will just say that pex's strength is being hermetic out of the box with flags to disable that if needed. I think a pex built with this feature should strip the PATH by default.
On being hermetic, I will just say that pex's strength is being hermetic out of the box with flags to disable that if needed. I think a pex built with this feature should strip the PATH by default.
I like it! Even though this breaks the "PEX scie works just like the PEX" ~guaranty, it breaks the one part about a PEX this fixes, which is sealing in the interpreter. The only reason the PEX needs to bounce around to find a compatible Python if there even is one, is because of that 1 glaring bit of non-hermiticity in traditional PEXes.
In my case, I wasn't trying to generate an exe or script - I was just trying to make a packaged repl with fastapi, uvicorn, and my foo.py (which seemed to work, as far as I could tell). I grabbed that example from something I was doing a couple of weeks ago on one of my many weird side-tangents. I'm sure there's a better way, but it worked one time I tried it, and I just ran with it since it's just a scratchpad.
Keep it behaving just like the PEX and bouncing down the PATH to find an interpreter that works (this means we shipped the wrong Python but the target machine had the right one), or keep things hermetic and fail as my experiment above does?
Alright, yeah, my behavioural expectation test was presuming the goal was: "PEX scie works just like the PEX" - which it does.
BUT, having said that, I think being hermetic is preferable. Building with and bundling different interpreters is an easy blunder to make, and the last place you want to find that error is after deployment.
In my case, I wasn't trying to generate an exe or script - I was just trying to make a packaged repl with fastapi, uvicorn, and my foo.py (which seemed to work, as far as I could tell).
@sureshjoshi it did not. The foo.py was not included. I think you are confused by how Pex works when you don't specify -o
- then, and only then, the -- ...
extra args get passed to the ephemeral PEX that is created, run, and thrown away.
In my case, I wasn't trying to generate an exe or script - I was just trying to make a packaged repl with fastapi, uvicorn, and my foo.py (which seemed to work, as far as I could tell).
@sureshjoshi it did not. The foo.py was not included.
🤦🏽
It was just loading the local foo.py all along.
Whelp, at least my pain and suffering led to a hermetic scie.
It was just loading the local foo.py all along.
@sureshjoshi yes. Thanks for that though - as you said, everything is better as a result - except perhaps your sanity. So, people seem to never zipinfo
on their PEXes, but its really helpful. So helpful, I went through alot of effort to make it so that you can do that to your PEX scie as well.
Hopefully very (power?) user friendly:
:; file foo
foo: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), static-pie linked, BuildID[sha1]=f1f01ca2ad165fed27f8304d4b2fad02dcacdffe, stripped
:; tail -1 foo | jq '.scie.lift.files[] | select(.key == "cpython")'
{
"name": "cpython-3.11.9+20240713-x86_64-unknown-linux-gnu-install_only.tar.gz",
"key": "cpython",
"size": 29814546,
"hash": "1f91c44febc850376a35ae77e1d45f7c823994b0c80293bbbc17e647eb893853",
"type": "tar.gz"
}
:; zipinfo -1 foo | tail
warning [foo]: 31627135 extra bytes at beginning or within zipfile
(attempting to process anyway)
.deps/websockets-12.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl/websockets-12.0.dist-info/
.deps/websockets-12.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl/websockets-12.0.dist-info/INSTALLER
.deps/websockets-12.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl/websockets-12.0.dist-info/LICENSE
.deps/websockets-12.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl/websockets-12.0.dist-info/METADATA
.deps/websockets-12.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl/websockets-12.0.dist-info/WHEEL
.deps/websockets-12.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl/websockets-12.0.dist-info/top_level.txt
PEX-INFO
__main__.py
__pex__/
__pex__/__init__.py
:; unzip -qc foo PEX-INFO | jq .requirements
warning [foo]: 31627135 extra bytes at beginning or within zipfile
(attempting to process anyway)
[
"fastapi",
"uvicorn"
]
Yeah, I unzipped and grepped, but I had the file referenced otherwise - so it showed up in my grep, but it was just a filename, not the file itself.
As I said, very weird tangents I was messing around with 🤦🏽
@benjyw I'm headed to the hills for a bit; so I'm going to proceed to merge this and get out a release. I feel good about the current commitments, but I'll circle back if you spot bugs or have questions.
Sounds fine, I'll take a look ASAP - I'm on vacation in Europe so code reviews are backing up.
You can now specify
--scie {eager,lazy}
when building a PEX file and one or more additional native executable PEX scies will be produced along side the PEX file. These PEX scies will contain a portable CPython interpreter from Python Standalone Builds in the--scie eager
case and will instead fetch a portable CPython interpreter just in time on first boot on a given machine if needed in the--scie lazy
case.Although Pex will pick the target platforms and target portable CPython interpreter version automatically, if more control is desired over which platforms are targeted and which Python version is used, then
--scie-platform
,--scie-pbs-release
, and--scie-python-version
can be specified.Closes #636 Closes #1007 Closes #2096