Closed stuhood closed 2 years ago
Why is layout zipapp? The packed layout was built to minimize zip time / cache hits, etc.
Ok, assuming there is some reason for needing zipapp, 1st seeing what it takes for native tools:
Python 3.10 fails lock download on:
ERROR: Could not find a version that satisfies the requirement onnxruntime==1.10.0
ERROR: No matching distribution found for onnxruntime==1.10.0
Trying 3.9:
ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==. These do not:
zipp>=0.5 from https://files.pythonhosted.org/packages/52/c5/df7953fe6065185af5956265e3b16f13c2826c2b1ba23d43154f3af453bc/zipp-3.7.0-py3-none-any.whl#sha256=b47250dd24f92b7dd6a0a8fc5244da14608f3ca90a5efcd37a3b1642fac9a375 (from importlib-metadata==4.11.2->-r lock.txt (line 506))
And in lock.txt zipp wants <3.9, so trying 3.8:
$ python3.8 -mvenv /tmp/1675.38.venv/
$ /tmp/1675.38.venv/bin/pip -q install -U pip==20.3.4
$ /tmp/1675.38.venv/bin/pip download --dest /tmp/1675/artifacts -r lock.txt --use-feature 2020-resolver
$ du -sh /tmp/1675/artifacts/
1.8G /tmp/1675/artifacts/
$ du -sm /tmp/1675/artifacts/* | sort -n | tail -10
14 /tmp/1675/artifacts/scikit_image-0.19.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
16 /tmp/1675/artifacts/mypy-0.930-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
17 /tmp/1675/artifacts/numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
23 /tmp/1675/artifacts/torchvision-0.11.3-cp38-cp38-manylinux1_x86_64.whl
26 /tmp/1675/artifacts/scikit_learn-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
40 /tmp/1675/artifacts/scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
46 /tmp/1675/artifacts/opencv_python_headless-4.5.5.64-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
58 /tmp/1675/artifacts/opencv_python-4.5.5.64-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
679 /tmp/1675/artifacts/nvidia_dali_cuda110-1.11.1-4069477-py3-none-manylinux2014_x86_64.whl
842 /tmp/1675/artifacts/torch-1.10.2-cp38-cp38-manylinux1_x86_64.whl
$ /tmp/1675.38.venv/bin/pip download --dest /tmp/1675/subset -r subset.txt --use-feature 2020-resolver --no-index -f /tmp/1675/artifacts
$ du -sh /tmp/1675/subset/
$ du -sm /tmp/1675/subset/* | sort -n | tail -10
9 /tmp/1675/subset/botocore-1.23.24-py3-none-any.whl
14 /tmp/1675/subset/scikit_image-0.19.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
17 /tmp/1675/subset/numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
23 /tmp/1675/subset/torchvision-0.11.3-cp38-cp38-manylinux1_x86_64.whl
26 /tmp/1675/subset/scikit_learn-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
40 /tmp/1675/subset/scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
46 /tmp/1675/subset/opencv_python_headless-4.5.5.64-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
58 /tmp/1675/subset/opencv_python-4.5.5.64-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
679 /tmp/1675/subset/nvidia_dali_cuda110-1.11.1-4069477-py3-none-manylinux2014_x86_64.whl
842 /tmp/1675/subset/torch-1.10.2-cp38-cp38-manylinux1_x86_64.whl
$ time zip -r subset.zip /tmp/1675/subset
real 0m34.614s
user 0m33.550s
sys 0m0.987s
$ ls -lh subset.zip
-rw-r--r-- 1 jsirois jsirois 1.8G Mar 18 07:06 subset.zip
$ mkdir /tmp/1675/subset-unzipped
$ for z in ../subset/*.whl; do unzip -d /tmp/1675/subset-unzipped $z; done
$ time zip -r subset-from-installed-wheels.zip /tmp/1675/subset-unzipped/
real 1m58.093s
user 1m56.249s
sys 0m1.259s
I think I can end there even with a lack of apples to apples on your timing measurement. Creating a zip that big from that many loose files simply takes that long.
I'll close this as an answered question, but please feel free to re-open and explain more what you're after if you're, for example, looking for Pex to experiment or expose compression levels (this experiment used defaults).
Why is layout zipapp? The packed layout was built to minimize zip time / cache hits, etc.
Because it's an externally packaged app (the package
goal), and that's the default. But ... yea, good point. I'll see whether using layout=
is an option for them.
Ok. I have never invested time thinking about re-thinking zipping. There may be perf to be squeezed there, but I'm honestly ignorant. IIUC each entry is seperately compressed which implies entries could be prepared in parallel, but it's unclear to me if this is feasible, worthwhile, hard, etc.
Ok. I have never invested time thinking about re-thinking zipping. There may be perf to be squeezed there, but I'm honestly ignorant. IIUC each entry is seperately compressed which implies entries could be prepared in parallel, but it's unclear to me if this is feasible, worthwhile, hard, etc.
I expect that layout=packed
will be win-win for this user if they're able to use it.
But with regard to making the packed -> zipapp
conversion faster, the "zip concatenation" strategy supported by posix zip
(and previously by some Java code in Pants v1) might be one approach: https://github.com/pantsbuild/pants/blob/dc59219906f8d4dde15fa74f3acd3f36d63f8bc9/src/python/pants/jvm/package/deploy_jar.py#L99-L106
Yeah, I'm pretty loathe to introduce external dependencies, but perhaps a probe for zip
and only specialize if present would be worthwhile to maintain.
Or, maybe just expose compression level. I try 7zip here, 1st trying default, then multithreaded #cores with BZip2 compression (which it says is the only compression it will parallelize), then no compression, and finally zip with no compression:
$ time 7z a -tzip 7zip.zip /tmp/1675/subset-unzipped/
7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs x64)
Scanning the drive:
2055 folders, 13339 files, 3807146753 bytes (3631 MiB)
Creating archive: 7zip.zip
Items to compress: 15394
Files read from disk: 13339
Archive size: 1818494446 bytes (1735 MiB)
Everything is Ok
real 1m59.799s
user 5m5.060s
sys 0m1.743s
$ time 7z a -tzip -mmt=16 -mm=BZip2 7zip-custom-16-bz2.zip /tmp/1675/subset-unzipped/
7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs x64)
Scanning the drive:
2055 folders, 13339 files, 3807146753 bytes (3631 MiB)
Creating archive: 7zip-custom-16-bz2.zip
Items to compress: 15394
Files read from disk: 13339
Archive size: 1694079081 bytes (1616 MiB)
Everything is Ok
real 4m8.031s
user 8m38.992s
sys 0m1.682s
$ time 7z a -tzip -mm=Copy 7zip-custom-copy.zip /tmp/1675/subset-unzipped/
7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs x64)
Scanning the drive:
2055 folders, 13339 files, 3807146753 bytes (3631 MiB)
Creating archive: 7zip-custom-copy.zip
Items to compress: 15394
Files read from disk: 13339
Archive size: 3810590821 bytes (3635 MiB)
Everything is Ok
real 0m3.902s
user 0m2.320s
sys 0m1.579s
$ time zip -r -0 zip-copy.zip /tmp/1675/subset-unzipped/
real 0m8.162s
user 0m6.340s
sys 0m1.813s
$ ls -lrth
...
-rw-r--r-- 1 jsirois jsirois 1.7G Mar 22 15:49 7zip.zip
-rw-r--r-- 1 jsirois jsirois 1.6G Mar 22 15:57 7zip-custom-16-bz2.zip
-rw-r--r-- 1 jsirois jsirois 3.6G Mar 22 15:59 7zip-custom-copy.zip
-rw-r--r-- 1 jsirois jsirois 3.6G Mar 22 16:04 zip-copy.zip
Since no compression takes more then an order of magnitude less time, perhaps that's enough and folks can decide to trade size for speed now, and suck up the speed cost on the netwrok transfer later, if there is one.
Since the performance tradeoff was so drastic in these experiments and exposing compression level is a pretty easy to do, forked #1686 to track doing that.
@cosmicexplorer did you consider / try out @stuhood's comment?:
But with regard to making the packed -> zipapp conversion faster, the "zip concatenation" strategy supported by posix zip
IOW, have Pex try zip -FF ..
of concatenated zips (created in parallel) if zip is present on the system?
It might be good to see how well that performs since the integration story is so simple.
That's a great idea!! Especially since the biggest perf gain by miles wasn't parallelizing but rather the caching enabled by the merge operation!
Oh, I'm going to try that right now.
Essentially, the approach in #2175 gives --layout zipapp
outputs the same cacheability as --layout packed
, but across every single dist you ever download vs just the single output directory of a packed pex.
For the attached lock.txt (superset) and subset.txt requirements, building a PEX using
--pex-repository
(on an AWS p2x instance inside of Docker) takes ~220s, primarily inside the "zipping" phase:The full PEX command to build the subset is: