Open AnthonyGiorgio opened 11 months ago
Our pax.Z files are meant to work standalone as well. That's a great reduction though. We could provide both flavours in our releases, a .xz package which is consumed by zopen install (and users who have xz installed) and a pax.Z package for those who do not have either.
Traditionally, downloadable packages have been provided in multiple flavors. Very early versions of GNU tools came in .Z
and .gz
. When bzip2
came out, they provided .gz
and .bz2
. After xz
was released. I saw all three versions supported for a bit. Nowadays it seems that everything is mostly in .xz
format.
I vote for a zstd
future. It has various knobs to control the trade-off of size vs compr/decompr speed & CPU.
While imagining, it would be awesome to see h/w accelerated zstd succeed h/w accelerated zlib/DEFLATE.
This is really odd:
[ITODORO@ZOSCAN2B ~/projects]$ pax -w -x pax -f meta.pax meta
[ITODORO@ZOSCAN2B ~/projects]$ pax -w -z -x pax -f meta.pax.Z meta
[ITODORO@ZOSCAN2B ~/projects]$ du -k meta.pax*
77224 meta.pax
101184 meta.pax.Z
71936 meta.pax.zstd
How is meta.pax.Z larger in size than meta.pax?
Both xz and zstd (using -19 as the compression level) resulted in a 7.1mb file. xz was a few hundred bytes smaller.
It's because the compression algorithm in compress
isn't that great. It was fine for a PDP-11 in the 1970's, but we're well beyond that now. compress
is supposed to reject files that grow in size, but there's a command line option to suppress that behavior.
With the compressed size being reasonably close, zstd is likely faster to decompress is it not, where we would want it to be fast - the build might take longer at maximum compression but if it saves a wee bit of time for end users at the expense of build-times and a marginally bigger download, might that be a good thing...?
Yeah, we can dial the knobs for zstd, to optimize for size or for compr/decompr speed.
I hadn't heard of zstd before. Would that be easy to port?
Yep, just tried it and It seems to be a lot faster than xz on z/OS:
Decompression: zstd: 0.27s
$ zstd -d git-2.43.0.20231127_145951.zos.pax.zst
git-2.43.0.20231127_145951.zos.pax.zst: 111585708 bytes
real 0m0.276s
user 0m0.191s
sys 0m0.064s
vs xz: (1.3s)
$ time xz -d git-2.43.0.20231127_145951.zos.pax.xz
real 0m1.370s
user 0m1.005s
sys 0m0.335s
Compression time is also a lot faster (30s for xz, vs 22s for zstd when I use the highest compression level). However, the compression was not quite as good. xz resulted in a 18mb file and zstd resulting in a 20mb file.
Decompression is the common case here, as the build server is the only one creating archives.
I would like us to pioneer zstd into the Z ecosystem :muscle: EDIT: But the TS7700 gang beat us to it in the back-end division - https://www.ibm.com/support/pages/system/files/inline-files/TS7770_R_5.0.1_Performance_Version_1.3.pdf
The pax files containing the port releases are compressed using the
compress
algorithm (.Z
). This is an ancient, inefficient compression algorithm that has long been supplanted by better choices. I suggest that we only use.Z
for the minimal bootstrapping packages, and instead use.xz
for everything else. This will significantly reduce both download time and space on disk for the package cache. It has the downside of makingxz
a dependency on the toolchain, but I think that's a reasonable tradeoff.