pex-tool / pex

A tool for generating .pex (Python EXecutable) files, lock files and venvs.
https://docs.pex-tool.org/
Apache License 2.0
2.53k stars 258 forks source link

Large amount of time spent in "zipping" when building a pex from a `--pex-repository` #1675

Closed stuhood closed 2 years ago

stuhood commented 2 years ago

For the attached lock.txt (superset) and subset.txt requirements, building a PEX using --pex-repository (on an AWS p2x instance inside of Docker) takes ~220s, primarily inside the "zipping" phase:

pex: Zipping PEX file.
pex: Zipping PEX file.: 221244.8ms

The full PEX command to build the subset is:

$HOME/.pyenv/versions/3.10.1/bin/python ./pex \
  --tmpdir.tmp \
  --jobs 8 \
  --python-path .... \
  --output-file example.pex \
  --no-emit-warnings \
  --manylinux manylinux2014 \
  --venv prepend \
  --requirements-pex local_dists.pex \
  --pex-repository default_lockfile.pex \
  --interpreter-constraint CPython==3.8.* \
  --entry-point ... \
  --sources-directory=source_files \
  albumentations<2.0.0,>=1.1.0 \
  boto3==1.20.24 \
  botocore==1.23.24 \
  docstring-parser<0.14.0,>=0.13 \
  jpeg4py<0.2.0,>=0.1.4 \
  jsonargparse[signatures]<5.0.0,>=4.3.1 \
  numpy<2.0.0,>=1.22.3 \
  nvidia-dali-cuda110>=1.11.0 \
  opencv-python<5.0.0,>=4.5.5 \
  pytorch-lightning<2.0.0,>=1.5.10 \
  requests<3.0.0,>=2.27.1 \
  s3fs==2022.2.0 \
  setuptools==59.5.0 \
  torch<2.0.0,>=1.10.2 \
  torchvision<0.12.0,>=0.11.3 \
  wandb<0.13.0,>=0.12.11 \
  --layout zipapp
jsirois commented 2 years ago

Why is layout zipapp? The packed layout was built to minimize zip time / cache hits, etc.

jsirois commented 2 years ago

Ok, assuming there is some reason for needing zipapp, 1st seeing what it takes for native tools:

Python 3.10 fails lock download on:
ERROR: Could not find a version that satisfies the requirement onnxruntime==1.10.0
ERROR: No matching distribution found for onnxruntime==1.10.0

Trying 3.9:
ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==. These do not:
    zipp>=0.5 from https://files.pythonhosted.org/packages/52/c5/df7953fe6065185af5956265e3b16f13c2826c2b1ba23d43154f3af453bc/zipp-3.7.0-py3-none-any.whl#sha256=b47250dd24f92b7dd6a0a8fc5244da14608f3ca90a5efcd37a3b1642fac9a375 (from importlib-metadata==4.11.2->-r lock.txt (line 506))

And in lock.txt zipp wants <3.9, so trying 3.8:
$ python3.8 -mvenv /tmp/1675.38.venv/
$ /tmp/1675.38.venv/bin/pip -q install -U pip==20.3.4
$ /tmp/1675.38.venv/bin/pip download --dest /tmp/1675/artifacts -r lock.txt  --use-feature 2020-resolver
$ du -sh /tmp/1675/artifacts/
1.8G    /tmp/1675/artifacts/
$ du -sm /tmp/1675/artifacts/* | sort -n | tail -10
14  /tmp/1675/artifacts/scikit_image-0.19.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
16  /tmp/1675/artifacts/mypy-0.930-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
17  /tmp/1675/artifacts/numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
23  /tmp/1675/artifacts/torchvision-0.11.3-cp38-cp38-manylinux1_x86_64.whl
26  /tmp/1675/artifacts/scikit_learn-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
40  /tmp/1675/artifacts/scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
46  /tmp/1675/artifacts/opencv_python_headless-4.5.5.64-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
58  /tmp/1675/artifacts/opencv_python-4.5.5.64-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
679 /tmp/1675/artifacts/nvidia_dali_cuda110-1.11.1-4069477-py3-none-manylinux2014_x86_64.whl
842 /tmp/1675/artifacts/torch-1.10.2-cp38-cp38-manylinux1_x86_64.whl

$ /tmp/1675.38.venv/bin/pip download --dest /tmp/1675/subset -r subset.txt --use-feature 2020-resolver --no-index -f /tmp/1675/artifacts
$ du -sh /tmp/1675/subset/
$ du -sm /tmp/1675/subset/* | sort -n | tail -10
9   /tmp/1675/subset/botocore-1.23.24-py3-none-any.whl
14  /tmp/1675/subset/scikit_image-0.19.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
17  /tmp/1675/subset/numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
23  /tmp/1675/subset/torchvision-0.11.3-cp38-cp38-manylinux1_x86_64.whl
26  /tmp/1675/subset/scikit_learn-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
40  /tmp/1675/subset/scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
46  /tmp/1675/subset/opencv_python_headless-4.5.5.64-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
58  /tmp/1675/subset/opencv_python-4.5.5.64-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
679 /tmp/1675/subset/nvidia_dali_cuda110-1.11.1-4069477-py3-none-manylinux2014_x86_64.whl
842 /tmp/1675/subset/torch-1.10.2-cp38-cp38-manylinux1_x86_64.whl

$ time zip -r subset.zip /tmp/1675/subset

real    0m34.614s
user    0m33.550s
sys 0m0.987s
$ ls -lh subset.zip 
-rw-r--r-- 1 jsirois jsirois 1.8G Mar 18 07:06 subset.zip

$ mkdir /tmp/1675/subset-unzipped
$ for z in ../subset/*.whl; do unzip -d /tmp/1675/subset-unzipped $z; done
$ time zip -r subset-from-installed-wheels.zip /tmp/1675/subset-unzipped/

real    1m58.093s
user    1m56.249s
sys 0m1.259s

I think I can end there even with a lack of apples to apples on your timing measurement. Creating a zip that big from that many loose files simply takes that long.

I'll close this as an answered question, but please feel free to re-open and explain more what you're after if you're, for example, looking for Pex to experiment or expose compression levels (this experiment used defaults).

stuhood commented 2 years ago

Why is layout zipapp? The packed layout was built to minimize zip time / cache hits, etc.

Because it's an externally packaged app (the package goal), and that's the default. But ... yea, good point. I'll see whether using layout= is an option for them.

jsirois commented 2 years ago

Ok. I have never invested time thinking about re-thinking zipping. There may be perf to be squeezed there, but I'm honestly ignorant. IIUC each entry is seperately compressed which implies entries could be prepared in parallel, but it's unclear to me if this is feasible, worthwhile, hard, etc.

stuhood commented 2 years ago

Ok. I have never invested time thinking about re-thinking zipping. There may be perf to be squeezed there, but I'm honestly ignorant. IIUC each entry is seperately compressed which implies entries could be prepared in parallel, but it's unclear to me if this is feasible, worthwhile, hard, etc.

I expect that layout=packed will be win-win for this user if they're able to use it.

But with regard to making the packed -> zipapp conversion faster, the "zip concatenation" strategy supported by posix zip (and previously by some Java code in Pants v1) might be one approach: https://github.com/pantsbuild/pants/blob/dc59219906f8d4dde15fa74f3acd3f36d63f8bc9/src/python/pants/jvm/package/deploy_jar.py#L99-L106

jsirois commented 2 years ago

Yeah, I'm pretty loathe to introduce external dependencies, but perhaps a probe for zip and only specialize if present would be worthwhile to maintain.

Or, maybe just expose compression level. I try 7zip here, 1st trying default, then multithreaded #cores with BZip2 compression (which it says is the only compression it will parallelize), then no compression, and finally zip with no compression:

$ time 7z a -tzip 7zip.zip /tmp/1675/subset-unzipped/

7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs x64)

Scanning the drive:
2055 folders, 13339 files, 3807146753 bytes (3631 MiB)

Creating archive: 7zip.zip

Items to compress: 15394

Files read from disk: 13339
Archive size: 1818494446 bytes (1735 MiB)
Everything is Ok

real    1m59.799s
user    5m5.060s
sys 0m1.743s

$ time 7z a -tzip -mmt=16 -mm=BZip2 7zip-custom-16-bz2.zip /tmp/1675/subset-unzipped/

7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs x64)

Scanning the drive:
2055 folders, 13339 files, 3807146753 bytes (3631 MiB)

Creating archive: 7zip-custom-16-bz2.zip

Items to compress: 15394

Files read from disk: 13339
Archive size: 1694079081 bytes (1616 MiB)
Everything is Ok

real    4m8.031s
user    8m38.992s
sys 0m1.682s

$ time 7z a -tzip -mm=Copy 7zip-custom-copy.zip /tmp/1675/subset-unzipped/

7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs x64)

Scanning the drive:
2055 folders, 13339 files, 3807146753 bytes (3631 MiB)

Creating archive: 7zip-custom-copy.zip

Items to compress: 15394

Files read from disk: 13339
Archive size: 3810590821 bytes (3635 MiB)
Everything is Ok

real    0m3.902s
user    0m2.320s
sys 0m1.579s

$ time zip -r -0 zip-copy.zip /tmp/1675/subset-unzipped/

real    0m8.162s
user    0m6.340s
sys 0m1.813s
$ ls -lrth
...
-rw-r--r--  1 jsirois jsirois 1.7G Mar 22 15:49 7zip.zip
-rw-r--r--  1 jsirois jsirois 1.6G Mar 22 15:57 7zip-custom-16-bz2.zip
-rw-r--r--  1 jsirois jsirois 3.6G Mar 22 15:59 7zip-custom-copy.zip
-rw-r--r--  1 jsirois jsirois 3.6G Mar 22 16:04 zip-copy.zip

Since no compression takes more then an order of magnitude less time, perhaps that's enough and folks can decide to trade size for speed now, and suck up the speed cost on the netwrok transfer later, if there is one.

jsirois commented 2 years ago

Since the performance tradeoff was so drastic in these experiments and exposing compression level is a pretty easy to do, forked #1686 to track doing that.

cosmicexplorer commented 1 year ago

2175 addresses this with an external dependency, which we can discuss once I can get it to pass mypy. I have been able to build a wheel for interpreter versions 3.7-3.11, impls, cpython and pypy, and platforms manylinux, musllinux, macos (x86 and arm), and even windows. But figuring out how to integrate that into pex without losing the utility of the universal py-only pex is obviously a discussion to have.

jsirois commented 1 year ago

@cosmicexplorer did you consider / try out @stuhood's comment?:

But with regard to making the packed -> zipapp conversion faster, the "zip concatenation" strategy supported by posix zip

IOW, have Pex try zip -FF .. of concatenated zips (created in parallel) if zip is present on the system? It might be good to see how well that performs since the integration story is so simple.

cosmicexplorer commented 1 year ago

That's a great idea!! Especially since the biggest perf gain by miles wasn't parallelizing but rather the caching enabled by the merge operation!

cosmicexplorer commented 1 year ago

Oh, I'm going to try that right now.

cosmicexplorer commented 1 year ago

Essentially, the approach in #2175 gives --layout zipapp outputs the same cacheability as --layout packed, but across every single dist you ever download vs just the single output directory of a packed pex.