linkedin / migz

Multithreaded, gzip-compatible compression and decompression, available as a platform-independent Java library and command-line utilities.
BSD 2-Clause "Simplified" License
77 stars 12 forks source link

MiGz未能解压缩GZIP压缩数据 #6

Open jyf1997 opened 3 years ago

jyf1997 commented 3 years ago

下列两组数据分别是GZIP和MiGz压缩二进制数据: GZIP1:1f8b08000000000000003334323634320400ee129cde06000000 MiGz1:1f8b080400000000020008004d5a0400080000003334323634320400ee129cde06000000 GZIP2:1f8b08000000000000002b4b2c4b492c2b4b4c010071ec6fe909000000 MiGz2:1f8b080400000000020008004d5a04000b0000002b4b2c4b492c2b4b4c010071ec6fe909000000

MIGZ 压缩的数据会多出20008004d5a040008000或者20008004d5a04000b000,请问这10个字节能去掉吗?

jeffpasternack commented 3 years ago

Hi--unfortunately, I do not speak Chinese. If possible, could you please restate your issue in English?

jyf1997 commented 3 years ago

I used GZIP and MiGz to compress the same data respectively, and found that there are several differences in the data compressed by MiGz. The data are as follows: GZIP1: 1f8b08000000000000003334323634320400ee129cde06000000 MiGz1: 1f8b080400000000020008004d5a0400080000003334323634320400ee129cde06000000 GZIP2: 1f8b08000000000000002b4b2c4b492c2b4b4c010071ec6fe909000000 MiGz2: 1f8b080400000000020008004d5a04000b0000002b4b2c4b492c2b4b4c010071ec6fe909000000 How can I adjust to keep the data compressed by MiGZ consistent with that of GZIP compression?

jeffpasternack commented 3 years ago

This is expected behavior--MiGz uses a different header than other GZip tools, and may or may not be using the same block compression. However, any GZip program will still be able to decompress a MiGz-compressed file.

Incidentally, other GZip implementations are also free to produce different bytes, so there's no guarantee that two GZip utilities will produce the exact same data when compressing the same file.

jyf1997 commented 3 years ago

we have some data compressed using gzip and want to use miGZ, but we are not allowed to ungip and re-miGZ previouse data. Can we still use miGZ to decompress the previouse data that are compressed using gzip? Thanks

jeffpasternack commented 3 years ago

Unfortunately it's not (in general) possible to do multithreaded decompression of data GZipped by standard utilities, so using MiGz would have no benefit in this case, and MiGz consequently does not support decompression of non-MiGz-compressed data.

As an aside, if you have multiple GZipped files you need to read, you might consider unzipping each file in separate thread to achieve a degree of parallelism.