madler / pigz

A parallel implementation of gzip for modern multi-processor, multi-core machines.
http://zlib.net/pigz/
2.65k stars 175 forks source link

corrupted -- crc32 mismatch #123

Closed giuseppe closed 2 months ago

giuseppe commented 2 months ago

pigz fails with a CRC32 mismatch error when decompressing a blob file. The same file passed the integrity check when tested with gzip.

Steps to reproduce:

Pull the container image with skopeo:

$ skopeo copy docker://docker.io/ollama/ollama:0.3.8-rocm oci:/var/tmp/ollama

Attempt to decompress the layer 1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727:

$ pigz -d < /var/tmp/ollama/blobs/sha256/1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727 > /dev/null
pigz: skipping: <stdin>: corrupted -- crc32 mismatch

The same file passes the integrity test with gzip:

$ gzip --test /var/tmp/ollama/blobs/sha256/1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727 && echo ok
ok

Originally reported here: https://github.com/containers/podman/issues/23822

I've tested both the version available on Fedora 40 (pigz-2.8-4.fc40.x86_64) and the development version (commit 74fa32d7c609200579bcc5c34fa9517a529834c2):

madler commented 2 months ago

Can you provide a link to this example file: 1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727 ? I have no idea what "skopeo" is.

giuseppe commented 2 months ago

I've not attached it here as the file is ~4GB.

It is part of the https://hub.docker.com/layers/ollama/ollama/0.3.8-rocm/images/sha256-77619298def296600dfa023e8c59de0433df8def5bb3c45c890a5536844fe997 container image, skopeo is a tool that can fetch the "raw" image from the registry without extracting it.

I've uploaded it at https://www.scrivano.org/static/1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727, hopefully that helps

madler commented 2 months ago

I cannot duplicate the problem. It tests fine with pigz. I get:

% pigz -ltv 1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727.txt
method    check    timestamp    compressed   original reduced  name
gzip 8  df9c7f66  ------ -----  4298394615 25785892864   83.3%  1db71b1d7c676...
%

Maybe try decompressing with -p 1 to see if that makes any difference.

Also please provide the output of pigz -vV.

giuseppe commented 2 months ago

the file name is its sha256 digest

$ sha256sum 1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727 
1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727  1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727

this is what I get locally:

$ pigz -vV
pigz 2.8
zlib 1.3.1.zlib-ng

$ pigz -ltv 1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727
pigz: skipping: 1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727: corrupted -- crc32 mismatch

after further investigation, I've found that it works with a zlib-ng version compiled with ./configure --without-optimizations --zlib-compat:

$ cd ~/src/zlib-ng
$ ./configure --without-optimizations --zlib-compat
$ make -j $(nproc)
$ cd ~/src/pigz
$ LD_LIBRARY_PATH=~/src/zlib-ng make -j $(nproc)
$ LD_LIBRARY_PATH=~/src/zlib-ng ./pigz -ltv /var/tmp/1db71b1d7c67607172266cd839e3e429bf523aab5b0df4761fc0d05ec55dc727
method    check    timestamp    compressed   original reduced  name
gzip 8  df9c7f66  ------ -----  4298394615 25785892864   83.3%  /var/tmp/1db71b1d7c676...

should I close the current issue and file a new one for zlib-ng?

giuseppe commented 2 months ago

it seems the difference is in the inflate_fast_avx2 function. If I comment it out and force it to use another variant, then it works also with ./configure --zlib-compat

diff --git a/functable.c b/functable.c
index dd8f7731..10abb358 100644
--- a/functable.c
+++ b/functable.c
@@ -115,7 +115,7 @@ static void init_functable(void) {
         ft.adler32_fold_copy = &adler32_fold_copy_avx2;
         ft.chunkmemset_safe = &chunkmemset_safe_avx2;
         ft.chunksize = &chunksize_avx2;
-        ft.inflate_fast = &inflate_fast_avx2;
+        //ft.inflate_fast = &inflate_fast_avx2;
         ft.slide_hash = &slide_hash_avx2;
 #  ifdef HAVE_BUILTIN_CTZ
         ft.compare256 = &compare256_avx2;
madler commented 2 months ago

should I close the current issue and file a new one for zlib-ng?

Yes, you should create a zlib-ng issue. I have closed this one.

giuseppe commented 2 months ago

thanks, opened here: https://github.com/zlib-ng/zlib-ng/issues/1772