Closed tgolsson closed 10 months ago
This has come up before. Two concrete results are the support for --layout packed
introduced in #1431 / #1438 and the --no-compress
Pex build option introduced in #1705. The associated issues have more discussion.
If neither --layout packed
, which amortizes the slow zip to once per wheel and is used by Pants internally for this and other reasons, nor --no-compress
are satisfactory, the only other approaches I can see are:
--layout packed
, but made usable for monolithic PEX zips.@cosmicexplorer explored both and came up wanting. I think #2158 is probably the best entrypoint into that work.
I'll have a peek at those, thanks. We already use layout=packed (+execution_mode=venv) in some situations. In the specific case where I hit this I was running a python_source
where I don't have control over that, and I'm not sure what the default is. The timings seem to end up the same as with the command posted though.
--no-compress
I think could work; if I can pass it into pants somewhere. Most of our pex building (with gpu wheels) is either to execute it immediately or to unpack it into a container. We have only one use-case for pex-at-rest, and that is a fraction of the size of these big GPU packages.
I still do think there is great value to being performant "by default" though, but maybe my effort is better invested into contributing to the already existing work by @cosmicexplorer -- will see if there's anything I can do there.
I still do think there is great value to being performant "by default" though
I agree there, but the only real solution for that is faster zip support. FWICT that is a problem for native code and not really related to Pex at all. With that implemented though, Pex - and many other tools - could benefit.
To be honest though, I think trying to make Pex - or any zipapp implementation - faster for behomoths like pytorch is fighting the wrong battle altogether. I imagine a much "simpler" way to do this is to not use a zipapp. For example, one might imagine a scie that contained all the resolved wheels for a zipapp, but not pre-installed wheels like PEXes contain, the actual wheel files downloaded from PyPI. The scie could then use PBS's Python distributions support for -mvenv
to create a venv and install the contained wheels. This would mean there is 0 compression time or effort spent packaging the scie since the wheels are used as-is and just cat'ed to the scie and there is only the 1 time install time of unzipping.
Alternatively, instead of the scie containing raw wheel files, a PEX could. Pex would then need to learn how to install wheels though at runtime. Currently it lets Pip do this at build time. In this way the whl contents of a PEX could be stored as STORED by default.
I agree there, but the only real solution for that is faster zip support. FWICT that is a problem for native code and not really related to Pex at all. With that implemented though, Pex - and many other tools - could benefit.
That is also an option, and looks like was explored fairly well. Will see if that can be landed, it'd definitely be good. My approach is Python native, but probably a lot hackier since it depended a lot on zipfile internals.
To be honest though, I think trying to make Pex - or any zipapp implementation - faster for behomoths like pytorch is fighting the wrong battle altogether. I imagine a much "simpler" way to do this is to not use a zipapp. For example, one might imagine a scie that contained all the resolved wheels for a zipapp, but not pre-installed wheels like PEXes contain, the actual wheel files downloaded from PyPI. The scie could then use PBS's Python distributions support for -mvenv to create a venv and install the contained wheels. This would mean there is 0 compression time or effort spent packaging the scie since the wheels are used as-is and just cat'ed to the scie and there is only the 1 time install time of unzipping.
I think my stance on torch
is that whatever they do, doing the opposite is likely better. My life (and yours, by extension) would be a lot better if we didn't have to think about why they decide to ship a whole copy of CUDA in their wheels, or why their native component is larger than the Linux kernel when built 🤷 Inexplicably, the situation is now even worse that more of CUDA is on PYPI.
Alternatively, instead of the scie containing raw wheel files, a PEX could. Pex would then need to learn how to install wheels though at runtime. Currently it lets Pip do this at build time. In this way the whl contents of a PEX could be stored as STORED by default.
Hmm. That doesn't sound half bad, at least for some use-cases. I guess it'd be almost the same size as well, since zip only uses local compression. A wheel install is pretty much guaranteed to be isolated, right? I'm not sure I can fully see the implications for Pants though, or how it'd end up working in every situation (pants package
vs run
vs export
...).
Hmm. That doesn't sound half bad, at least for some use-cases. I guess it'd be almost the same size as well, since zip only uses local compression. A wheel install is pretty much guaranteed to be isolated, right? I'm not sure I can fully see the implications for Pants though, or how it'd end up working in every situation (pants package vs run vs export...).
This would be opaque to all Pex users at runtime. The PEX zipapp would use STORED unadulterated .whl files instead of today's DEFLATED installed wheel chroots and the packed layout would use .deps/X.whl
unadulterated .whl files instead of today's zipped-up installed wheel chroots. At runtime, new Pex installer code would install from these internal files (unzip + spread as per https://packaging.python.org/en/latest/specifications/binary-distribution-format/#installing-a-wheel-distribution-1-0-py32-none-any-whl ... plus a little more since that spec is actually wanting for how console scripts are actually handled in the wild) into the ~/.pex/installed_wheels
(and then create a venv if using --venv
from there), exactly as today.
I really do think this is the right way to go. Don't speed up zipping, avoid unzipping (installing wheels at build time) + zipping (back into a PEX zipapp or packed layout ~wheel zips) altogether. There will still be an unzip on a cold cache for the 1st boot at runtime, but since zipfile.ZipFile(zipfile.ZipFile("the.pex").open(".deps/X.whl")).extractall("here")
works and is efficient, this should be ~the same PEX 1st boot install time as today.
I experimented enough writing a PEP-427 installer today to see it works, but you need to handle generating console scripts since .whls in the wild, for the most part, don't actually carry these in proj-rev.data/scripts/... as you'd hope they would given PEP-427.
@tgolsson I won't have solid time until the 23-28th, but I think I can get this knocked out and released then. I'm not sure exactly how to spell the feature activation, perhaps two new --layout
options - one for zipapp and one for spread, but that's not too important as long as no existing users / PEX_ROOT caches are broken.
That sounds very good. My concern with pants is mostly how far away from pants <goal>
a potential error can occur, since I assume there are issues that could surface only when installing wheels. But since adding this feature to Pants would require work anyway, that's not going to be an immediate problem - and I'm guessing this would be opt-in per target either way.
It also seems like a good feature for Pex, regardless of Pants usage.
Noting I did not complete this during the current work stretch. It will be picked back up on December 10th when I start my next work stretch.
This should completely side-step the need for #2158 since it does better than that approach ever could by avoiding zipping altogether (and unzipping as well!).
Ok, circling back to the OP using #2298:
$ rm -rf ~/.pex/installed_wheels/
$ time python3.11 -mpex -v torch==2.1.1 -o t2.2.pex
...
pex: Building pex: 20905.4ms
pex: Adding distributions from pexes: : 0.0ms
pex: Resolving distributions for requirements: torch==2.1.1: 20902.6ms
pex: Resolving requirements.: 20902.5ms
pex: Resolving for:
/usr/bin/python3.11: 8135.2ms
pex: Calculating project names for direct requirements:
PyPIRequirement(line=LogicalLine(raw_text='torch==2.1.1', processed_text='torch==2.1.1', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='torch', url=None, extras=frozenset(), specifier=<SpecifierSet('==2.1.1')>, marker=None), editable=False): 0.1ms
pex: Installing 22 distributions: 10994.5ms
pex: Checking install: 2.2ms
pex: Configuring PEX dependencies: 2.3ms
Saving PEX file to t2.2.pex
Previous binary unexpectedly exists, cleaning: t2.2.pex
pex: Zipping PEX file.: 167895.1ms
/home/jsirois/dev/pantsbuild/jsirois-pex/pex/pex_builder.py:113: PEXWarning: The PEX zip at t2.2.pex~ is not a valid zipapp: Could not find the `__main__` module.
This is likely due to the zip requiring ZIP64 extensions due to size or the
number of file entries or both. You can work around this limitation in Python's
`zipimport` module by re-building the PEX with `--layout packed` or
`--layout loose`.
pex_warnings.warn(message)
--no-pre-install-wheels
:
$ rm -rf ~/.pex/installed_wheels/
$ python3.11 -mpex -v torch==2.1.1 --no-pre-install-wheels -o t2.2.pex
...
pex: Building pex: 10125.3ms
pex: Adding distributions from pexes: : 0.0ms
pex: Resolving distributions for requirements: torch==2.1.1: 10123.1ms
pex: Resolving requirements.: 10123.1ms
pex: Resolving for:
/usr/bin/python3.11: 8274.5ms
pex: Calculating project names for direct requirements:
PyPIRequirement(line=LogicalLine(raw_text='torch==2.1.1', processed_text='torch==2.1.1', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='torch', url=None, extras=frozenset(), specifier=<SpecifierSet('==2.1.1')>, marker=None), editable=False): 0.1ms
pex: Checking build: 2.1ms
pex: Configuring PEX dependencies: 1.7ms
Saving PEX file to t2.2.pex
pex: Zipping PEX file.: 3173.1ms
/home/jsirois/dev/pantsbuild/jsirois-pex/pex/pex_builder.py:113: PEXWarning: The PEX zip at t2.2.pex~ is not a valid zipapp: Could not find the `__main__` module.
This is likely due to the zip requiring ZIP64 extensions due to size or the
number of file entries or both. You can work around this limitation in Python's
`zipimport` module by re-building the PEX with `--layout packed` or
`--layout loose`.
pex_warnings.warn(message)
So that's: | Status quo | Using --no-pre-install-wheels |
|
---|---|---|---|
Pre-install time (~unzip) | 10.99s | N/A | |
Zip time | 167.89s | 3.17s | |
Size (bytes) | 2680106601 | 2677995839 |
Of course, this is not a great example since the resulting PEX cannot be run as the elided warning indicates in both cases; so we can't examine the tradeoff in the 1st boot runtime penalty for installing the wheels just in time.
And, using the OP, but with --layout packed --venv --venv-site-packages-copies
, which is required to work around the zipapp size issue and work around indirect nvidia dependencies failure to properly use namespace packages:
$ rm -rf ~/.pex/installed_wheels/ ~/.pex/packed_wheels/
$ python3.11 -mpex -v torch==2.1.1 --venv --venv-site-packages-copies --layout packed -o t2.2.pex
...
pex: Building pex: 20589.6ms
pex: Adding distributions from pexes: : 0.0ms
pex: Resolving distributions for requirements: torch==2.1.1: 20586.9ms
pex: Resolving requirements.: 20586.9ms
pex: Resolving for:
/usr/bin/python3.11: 8686.9ms
pex: Calculating project names for direct requirements:
PyPIRequirement(line=LogicalLine(raw_text='torch==2.1.1', processed_text='torch==2.1.1', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='torch', url=None, extras=frozenset(), specifier=<SpecifierSet('==2.1.1')>, marker=None), editable=False): 0.1ms
pex: Installing 22 distributions: 10215.5ms
pex: Checking install: 1.7ms
pex: Configuring PEX dependencies: 2.2ms
Saving PEX file to t2.2.pex
pex: Zipping PEX .bootstrap/ code.: 86.5ms
pex: Zipping 22 distributions.: 172517.1ms
$ du -sb t2.2.pex/
2679282217 t2.2.pex/
$ python3.11 -mpex -v torch==2.1.1 --venv --venv-site-packages-copies --layout packed -o t2.2.pex
...
pex: Building pex: 12982.0ms
pex: Adding distributions from pexes: : 0.1ms
pex: Resolving distributions for requirements: torch==2.1.1: 12979.3ms
pex: Resolving requirements.: 12979.2ms
pex: Resolving for:
/usr/bin/python3.11: 8217.2ms
pex: Calculating project names for direct requirements:
PyPIRequirement(line=LogicalLine(raw_text='torch==2.1.1', processed_text='torch==2.1.1', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='torch', url=None, extras=frozenset(), specifier=<SpecifierSet('==2.1.1')>, marker=None), editable=False): 0.1ms
pex: Installing 22 distributions: 3051.5ms
pex: Checking install: 1.8ms
pex: Configuring PEX dependencies: 2.2ms
Saving PEX file to t2.2.pex
pex: Zipping PEX .bootstrap/ code.: 0.0ms
pex: Zipping 22 distributions.: 0.4ms
$ du -sb t2.2.pex/
2679282217 t2.2.pex/
--no-pre-install-wheels
(~same for warm and cold cases):
$ python3.11 -mpex -v torch==2.1.1 --venv --venv-site-packages-copies --layout packed --no-pre-install-wheels -o t2.2.whls.pex
...
pex: Building pex: 10429.3ms
pex: Adding distributions from pexes: : 0.0ms
pex: Resolving distributions for requirements: torch==2.1.1: 10427.3ms
pex: Resolving requirements.: 10427.2ms
pex: Resolving for:
/usr/bin/python3.11: 8666.5ms
pex: Calculating project names for direct requirements:
PyPIRequirement(line=LogicalLine(raw_text='torch==2.1.1', processed_text='torch==2.1.1', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='torch', url=None, extras=frozenset(), specifier=<SpecifierSet('==2.1.1')>, marker=None), editable=False): 0.1ms
pex: Checking build: 1.7ms
pex: Configuring PEX dependencies: 1.7ms
Saving PEX file to t2.2.whls.pex
pex: Zipping PEX .bootstrap/ code.: 91.7ms
pex: Copying 22 distributions.: 0.2ms
$ du -sb t2.2.whls.pex/
2678537958 t2.2.whls.pex/
So that's: | Status quo (cold) | Status quo (warm) | Using --no-pre-install-wheels |
|
---|---|---|---|---|
Pre-install time (~unzip) | 10.22s | N/A | N/A | |
Zip / Copy time | 172.52s | 0.4s | 0.2s | |
Size (bytes) | 2679282217 | 2679282217 | 2678537958 |
And at runtime:
$ hyperfine \
-w2 \
-p 'rm -rf ~/.pex/unzipped_pexes ~/.pex/venvs' \
-p 'rm -rf ~/.pex/unzipped_pexes ~/.pex/venvs' \
-p 'rm -rf ~/.pex/installed_wheels ~/.pex/unzipped_pexes ~/.pex/venvs' \
-p 'rm -rf ~/.pex/installed_wheels ~/.pex/unzipped_pexes ~/.pex/venvs' \
-p '' \
-p 'rm -rf ~/.pex/installed_wheels ~/.pex/unzipped_pexes ~/.pex/venvs' \
-p 'rm -rf ~/.pex/installed_wheels ~/.pex/unzipped_pexes ~/.pex/venvs' \
-p '' \
-n 'Status quo warm 1st' \
-n 'Status quo warm 1st parallel' \
-n 'Status quo cold 1st' \
-n 'Status quo cold 1st parallel' \
-n 'Status quo hot' \
-n 'With --no-pre-install-wheels 1st' \
-n 'With --no-pre-install-wheels 1st parallel' \
-n 'With --no-pre-install-wheels hot' \
't2.2.pex/__main__.py -c "import torch"' \
'PEX_MAX_INSTALL_JOBS=0 t2.2.pex/__main__.py -c "import torch"' \
't2.2.pex/__main__.py -c "import torch"' \
'PEX_MAX_INSTALL_JOBS=0 t2.2.pex/__main__.py -c "import torch"' \
't2.2.pex/__main__.py -c "import torch"' \
't2.2.whls.pex/__main__.py -c "import torch"' \
'PEX_MAX_INSTALL_JOBS=0 t2.2.whls.pex/__main__.py -c "import torch"' \
't2.2.whls.pex/__main__.py -c "import torch"'
Benchmark 1: Status quo warm 1st
Time (mean ± σ): 5.765 s ± 0.040 s [User: 5.017 s, System: 0.734 s]
Range (min … max): 5.717 s … 5.853 s 10 runs
Benchmark 2: Status quo warm 1st parallel
Time (mean ± σ): 5.991 s ± 0.035 s [User: 7.267 s, System: 0.885 s]
Range (min … max): 5.952 s … 6.054 s 10 runs
Benchmark 3: Status quo cold 1st
Time (mean ± σ): 26.737 s ± 0.338 s [User: 24.027 s, System: 2.683 s]
Range (min … max): 26.307 s … 27.365 s 10 runs
Benchmark 4: Status quo cold 1st parallel
Time (mean ± σ): 12.790 s ± 0.141 s [User: 30.314 s, System: 3.424 s]
Range (min … max): 12.549 s … 12.969 s 10 runs
Benchmark 5: Status quo hot
Time (mean ± σ): 889.1 ms ± 4.9 ms [User: 815.3 ms, System: 68.5 ms]
Range (min … max): 883.1 ms … 898.3 ms 10 runs
Benchmark 6: With --no-pre-install-wheels 1st
Time (mean ± σ): 29.602 s ± 0.137 s [User: 26.534 s, System: 3.034 s]
Range (min … max): 29.480 s … 29.955 s 10 runs
Benchmark 7: With --no-pre-install-wheels 1st parallel
Time (mean ± σ): 14.062 s ± 0.245 s [User: 34.360 s, System: 3.842 s]
Range (min … max): 13.780 s … 14.540 s 10 runs
Benchmark 8: With --no-pre-install-wheels hot
Time (mean ± σ): 882.1 ms ± 4.0 ms [User: 810.3 ms, System: 66.7 ms]
Range (min … max): 874.7 ms … 889.1 ms 10 runs
Summary
With --no-pre-install-wheels hot ran
1.01 ± 0.01 times faster than Status quo hot
6.54 ± 0.05 times faster than Status quo warm 1st
6.79 ± 0.05 times faster than Status quo warm 1st parallel
14.50 ± 0.17 times faster than Status quo cold 1st parallel
15.94 ± 0.29 times faster than With --no-pre-install-wheels 1st parallel
30.31 ± 0.41 times faster than Status quo cold 1st
33.56 ± 0.22 times faster than With --no-pre-install-wheels 1st
So, in summary, that's (assuming resolve time for the build and run cases are equal and so are ignored): | Status quo | With --no-pre-install-wheels |
--no-pre-install-wheels savings |
|
---|---|---|---|---|
Cold build and run 1st local machine | 188.51s | 29.80s | 84% faster | |
Cold run 1st remote machine | 26.74s | 29.60s | 11% slower | |
Cold run 1st remote machine parallel | 12.79s | 14.06s | 10% slower | |
Size (bytes) | 2679282217 | 2678537958 | 0.02% smaller |
This means, for local, internal-only use --no-pre-install-wheels
is always a win. Important examples are Pants's Python backend use case and @cosmicexplorer's case in #2158 of local iteration on an ML / data science project.
For cases where remote deployment cold 1st run start time is important (legacy lambdex use cases come to mind), --no-pre-install-wheels
will always be a small loss.
For other cases the perf is a wash and more localized analysis is needed to decide which set of options to use.
The analysis above is at the extreme end of PEX sizes (~2GB). I'll add the same analysis below for the extreme small end (A cowsay PEX) to button this up, assuming ~linearity between the two extremes.
Ok, for a small case I used cowsay and ansicolors deps with this 93 byte main.py
and driver scripts:
$ ./build-cowsay.sh && ./perf-cowsay.sh
Benchmark 1: Build zipappi (cold)
Time (mean ± σ): 1.146 s ± 0.028 s [User: 1.075 s, System: 0.161 s]
Range (min … max): 1.110 s … 1.189 s 10 runs
Benchmark 2: Build .whl zipapp (cold)
Time (mean ± σ): 1.047 s ± 0.026 s [User: 0.914 s, System: 0.131 s]
Range (min … max): 1.011 s … 1.081 s 10 runs
Benchmark 3: Build packed (cold)
Time (mean ± σ): 1.125 s ± 0.016 s [User: 1.073 s, System: 0.136 s]
Range (min … max): 1.109 s … 1.167 s 10 runs
Benchmark 4: Build .whl packed (cold)
Time (mean ± σ): 1.034 s ± 0.008 s [User: 0.893 s, System: 0.140 s]
Range (min … max): 1.017 s … 1.042 s 10 runs
Benchmark 5: Build loose (cold)
Time (mean ± σ): 1.077 s ± 0.010 s [User: 1.030 s, System: 0.131 s]
Range (min … max): 1.062 s … 1.094 s 10 runs
Benchmark 6: Build .whl loose (cold)
Time (mean ± σ): 995.2 ms ± 17.7 ms [User: 852.2 ms, System: 142.5 ms]
Range (min … max): 972.2 ms … 1028.8 ms 10 runs
Benchmark 7: Build zipappi (warm)
Time (mean ± σ): 413.8 ms ± 12.5 ms [User: 370.8 ms, System: 43.0 ms]
Range (min … max): 399.5 ms … 437.5 ms 10 runs
Benchmark 8: Build .whl zipapp (warm)
Time (mean ± σ): 401.1 ms ± 5.4 ms [User: 345.5 ms, System: 55.5 ms]
Range (min … max): 396.0 ms … 415.1 ms 10 runs
Benchmark 9: Build packed (warm)
Time (mean ± σ): 351.6 ms ± 2.9 ms [User: 314.1 ms, System: 37.3 ms]
Range (min … max): 348.6 ms … 357.1 ms 10 runs
Benchmark 10: Build .whl packed (warm)
Time (mean ± σ): 354.5 ms ± 11.4 ms [User: 315.7 ms, System: 38.5 ms]
Range (min … max): 343.2 ms … 372.2 ms 10 runs
Benchmark 11: Build loose (warm)
Time (mean ± σ): 358.2 ms ± 2.5 ms [User: 307.2 ms, System: 50.5 ms]
Range (min … max): 354.3 ms … 364.1 ms 10 runs
Benchmark 12: Build .whl loose (warm)
Time (mean ± σ): 365.2 ms ± 19.3 ms [User: 314.1 ms, System: 51.2 ms]
Range (min … max): 352.7 ms … 415.4 ms 10 runs
Summary
Build packed (warm) ran
1.01 ± 0.03 times faster than Build .whl packed (warm)
1.02 ± 0.01 times faster than Build loose (warm)
1.04 ± 0.06 times faster than Build .whl loose (warm)
1.14 ± 0.02 times faster than Build .whl zipapp (warm)
1.18 ± 0.04 times faster than Build zipappi (warm)
2.83 ± 0.06 times faster than Build .whl loose (cold)
2.94 ± 0.03 times faster than Build .whl packed (cold)
2.98 ± 0.08 times faster than Build .whl zipapp (cold)
3.06 ± 0.04 times faster than Build loose (cold)
3.20 ± 0.05 times faster than Build packed (cold)
3.26 ± 0.08 times faster than Build zipappi (cold)
709130 /home/jsirois/dev/pantsbuild/jsirois-pex/app/cowsay.zipapp.whls.pex
714166 /home/jsirois/dev/pantsbuild/jsirois-pex/app/cowsay.zipapp.pex
721772 /home/jsirois/dev/pantsbuild/jsirois-pex/app/cowsay.packed.whls.pex
723960 /home/jsirois/dev/pantsbuild/jsirois-pex/app/cowsay.packed.pex
2543013 /home/jsirois/dev/pantsbuild/jsirois-pex/app/cowsay.loose.whls.pex
2670261 /home/jsirois/dev/pantsbuild/jsirois-pex/app/cowsay.loose.pex
Benchmark 1: Run zipapp cold
Time (mean ± σ): 433.1 ms ± 17.8 ms [User: 383.9 ms, System: 48.6 ms]
Range (min … max): 417.3 ms … 476.7 ms 10 runs
Benchmark 2: Run .whl zipapp cold
Time (mean ± σ): 511.4 ms ± 8.2 ms [User: 469.1 ms, System: 41.9 ms]
Range (min … max): 497.8 ms … 524.0 ms 10 runs
Benchmark 3: Run packed cold
Time (mean ± σ): 422.3 ms ± 5.1 ms [User: 375.7 ms, System: 46.3 ms]
Range (min … max): 413.4 ms … 429.8 ms 10 runs
Benchmark 4: Run .whl packed cold
Time (mean ± σ): 504.6 ms ± 7.0 ms [User: 455.2 ms, System: 49.0 ms]
Range (min … max): 493.8 ms … 515.9 ms 10 runs
Benchmark 5: Run loose cold
Time (mean ± σ): 239.7 ms ± 6.5 ms [User: 212.8 ms, System: 26.5 ms]
Range (min … max): 231.2 ms … 256.2 ms 12 runs
Benchmark 6: Run .whl loose cold
Time (mean ± σ): 332.3 ms ± 5.1 ms [User: 285.4 ms, System: 46.7 ms]
Range (min … max): 326.7 ms … 340.5 ms 10 runs
Benchmark 7: Run zipapp cold (parallel)
Time (mean ± σ): 550.6 ms ± 4.4 ms [User: 551.2 ms, System: 55.1 ms]
Range (min … max): 544.3 ms … 556.6 ms 10 runs
Benchmark 8: Run .whl zipapp coldi (parallel)
Time (mean ± σ): 586.3 ms ± 5.2 ms [User: 616.6 ms, System: 65.1 ms]
Range (min … max): 581.7 ms … 595.8 ms 10 runs
Benchmark 9: Run packed cold (parallel)
Time (mean ± σ): 545.6 ms ± 8.2 ms [User: 551.4 ms, System: 50.6 ms]
Range (min … max): 536.5 ms … 561.9 ms 10 runs
Benchmark 10: Run .whl packed cold (parallel)
Time (mean ± σ): 580.6 ms ± 4.8 ms [User: 608.2 ms, System: 64.9 ms]
Range (min … max): 573.0 ms … 588.4 ms 10 runs
Benchmark 11: Run loose cold (parallel)
Time (mean ± σ): 232.4 ms ± 2.3 ms [User: 211.8 ms, System: 20.3 ms]
Range (min … max): 229.4 ms … 237.2 ms 12 runs
Benchmark 12: Run .whl loose cold (parallel)
Time (mean ± σ): 411.7 ms ± 2.4 ms [User: 449.2 ms, System: 56.2 ms]
Range (min … max): 407.8 ms … 416.1 ms 10 runs
Summary
Run loose cold (parallel) ran
1.03 ± 0.03 times faster than Run loose cold
1.43 ± 0.03 times faster than Run .whl loose cold
1.77 ± 0.02 times faster than Run .whl loose cold (parallel)
1.82 ± 0.03 times faster than Run packed cold
1.86 ± 0.08 times faster than Run zipapp cold
2.17 ± 0.04 times faster than Run .whl packed cold
2.20 ± 0.04 times faster than Run .whl zipapp cold
2.35 ± 0.04 times faster than Run packed cold (parallel)
2.37 ± 0.03 times faster than Run zipapp cold (parallel)
2.50 ± 0.03 times faster than Run .whl packed cold (parallel)
2.52 ± 0.03 times faster than Run .whl zipapp coldi (parallel)
The summary is:
.whl
builds are slightly faster than the status quo as expected (no unzipping and, for zipapp and packed, re-zipping is required)..whl
1st cold runs are slightly slower than the status quo as expected (extra install step at runtime).
Hey!
Not sure if actionable, but maybe there's something here that can be done. I was investigating another issue today and ended up seeing a very slow Pants package step ~5 minutes. The issue reproduces with the simple command line
pex -vvv torch>=2 -o t2.2.pex
. This takes ~280 seconds on my machine, of which ~210-220 is spent purely in the zip step:This turns out to a 2.5 GB pex, which admittedly is on the fat side. Unzipping this beast takes ~30 seconds, and zipping it with regular
zip
takes ~230 seconds.zip -1
takes ~100 seconds and adds ~10% to the size.zip -0
takes 12 seconds but doubles the size. Seeing as compression seems to add the majority of the runtime, I did a very quick hack (outside of pex) where I move the compress step to a process pool (since it's CPU-heavy). With that, I get ~30 seconds at level 1, or about ~60 seconds on level 6. So 3-4x speed increase. It may be able to push this a bit higher by playing with ordering.I also played around with the store-only-by-suffix capabilities, but it seems like the .so's make up the bulk of both the compression potential and time: only compressing text-like files gives a ~4.3 GB zip in 20 seconds.
With all that said, I'm mostly curious if this is something that has been discussed elsewhere (found nothing while searching), and what kind of solution might be palatable relative to the gains that can be made. I'm willing to contribute something based on the work I've done so far, or investigate other suggested approaches.