pex-tool / pex

A tool for generating .pex (Python EXecutable) files, lock files and venvs.
https://docs.pex-tool.org/
Apache License 2.0
2.52k stars 258 forks source link

Pex CLI doesn't offer as much control over pex building process as the PEXBuilder #1907

Open hrfuller opened 1 year ago

hrfuller commented 1 year ago

As a major PEX user twitter leverages the PEXBuilder class to build pexes instead of shelling out to the PEX cli for a few reasons.

  1. The --sources-directory option to add sources to a pex doesn't offer fine enough control over the final import paths of source files. In order to make --sources-directory work we would have to prepare all the sources in merged directories. E.g Often multiple directories need to be merged into the same Chroot directory.
  2. We make use of PEXBuilder.add_dist_location to add pre-installed wheels (unzipped wheels) to pexes. We have found that the overhead of using direct file URLs via --requirement on the CLI and even the main PEXBuilder.add_distribution methods to add significant overhead by invoking pip.pex for each distribution. We can improve build times significantly by relying on the existence of a mechanism for fetching 3rdparties that is outside of pex itself.
  3. ... will update if I think of more ...

These may not be things that the Pex CLI wants to support but figured I would list them out based on the fact that PEXBuilder isn't under a stable API contract with users.

jsirois commented 1 year ago

@hrfuller can you provide more detail on item 1? What would make sense? Right now you can only use multiple --sources-directory to point at multiple sys.path entries and everything under each pointed to is merged. Do you need to further exclude some files or subdirectories from --sources-directory entries or would a simpler explicit file list work?

On 2 can you provide more color? Pex already never installs a given wheel more than once for a given PEX_ROOT. If the wheel has already been installed there (in installed_wheels/<hash>/the.whl/) it will be re-used. Is it the case you have a cache of installed wheels that is resident but not a PEX_ROOT that is resident? Do you use pex3 lock create lock files? Those also cut out Pip overheads such that, given a --lock Pex just downloads wheels mentioned in the lock (if needed) and installs them (if needed) but never invokes Pip's resolution code after lock creation.

jsirois commented 1 year ago

Basically I need a lot more detail about the environmental constraints you use Pex under I think as well as the perf improvements you get in % terms using your current techniques over some example PEX CLI baseline. With that info in hand, I can come up with real answers or ideas or infeasibles.

jsirois commented 4 weeks ago

@hrfuller you were not able to provide more details and Pex has since grown even more ways to pick sources, now including:

I'm going to assume the details won't be forthcoming and close, but please re-open if you have more specifics about cases modern Pex does not handle well via the CLI that you think it should.

jsirois commented 3 weeks ago

I totally missed item 2. Although you can now pex --no-pre-install-wheels to get wheels stored in a PEX as is (no extra compression, no pre-installation in a chroot), you still must run a Pip resolve. I find a ~40% speed increase with a quick hack building a torch PEX:

# Existing perf for pre-resolved wheel house:
:; time python -mpex --pip-version latest --no-pre-install-wheels /tmp/wheels/*.whl --intransitive -otorch.pex
/home/jsirois/dev/pex-tool/pex/pex/pex_builder.py:105: PEXWarning: The PEX zip at torch.pex~ is not a valid zipapp: Could not find the `__main__` module.
This is likely due to the zip requiring ZIP64 extensions due to size or the
number of file entries or both. You can work around this limitation in Python's
`zipimport` module by re-building the PEX with `--layout packed` or
`--layout loose`.
  pex_warnings.warn(message)

real    0m10.857s
user    0m6.478s
sys     0m3.873s

# A quick hack that allows adding all wheels in `PEX_WHEELS_DIR` to the PEXBuilder directly:
:; time PEX_WHEELS_DIR=/tmp/wheels python -mpex --pip-version latest --no-pre-install-wheels -otorch.pex
/home/jsirois/dev/pex-tool/pex/pex/pex_builder.py:105: PEXWarning: The PEX zip at torch.pex~ is not a valid zipapp: Could not find the `__main__` module.
This is likely due to the zip requiring ZIP64 extensions due to size or the
number of file entries or both. You can work around this limitation in Python's
`zipimport` module by re-building the PEX with `--layout packed` or
`--layout loose`.
  pex_warnings.warn(message)

real    0m6.376s
user    0m3.977s
sys     0m2.451s

On the precedent front there is --requirement-pex:

  --requirements-pex FILE
                        Add requirements from the given .pex file. This option
                        can be used multiple times. (default: [])

So I'll re-open to fix item 2 in the OP.

hrfuller commented 3 weeks ago

Thanks John! I haven't been involved with pex since some job changes but if you still feel like 2 would be a benefit to the project that's great. There has been talk of a pex_binary target in bazel rules_python for sometime. Based on my experience writing bazel rules for pex these changes would have given more control over the final pex that is created. Cheers.

As for details in number 1 I'm a bit hazy now but it had to do with merging directories that represented the same "source root" a pants concept into the pexes being created. This was using a much older version of pex v2 so that may have changed already

jsirois commented 3 weeks ago

Hey Henry! Yeah, I think 2 makes sense as a feature on its own. As to 1, I'm still hazy too. you could definitely merge source roots just using -D source/root/1 -D source/root/2 even back when you filed this issue. So, at that point in time you must've wanted to be able to mask off portions of those source roots to not grab all of them, just some. AFAICT, you can achieve that now with -P package1@source/root/1 -P package2@source/root/1 -D package3@source/root/2. It may be the case that subtracting is easier in some cases than adding like in this example, but I'll definitely wait for specific pain reports to surface before delving into that.