mcmilk / 7-Zip-zstd

7-Zip with support for Brotli, Fast-LZMA2, Lizard, LZ4, LZ5 and Zstandard
https://mcmilk.de/projects/7-Zip-zstd/
Other
4.93k stars 296 forks source link

ZStandard archive created with 17.01/v1.3.2 R1 can't be decompressed with 18.05/v1.3.7 R3 #63

Closed Hermholtz closed 5 years ago

Hermholtz commented 5 years ago

I've packed TPC-H files one year ago with v1.3.2 R1 and now while trying to decompress I'm having "Unspecified error".

image

Here you have a tiny test archive that exhibits the same behavior: https://1drv.ms/u/s!AuJRaFCgXIqdgYBpSEd6Y99D7CXAYQ

mcmilk commented 5 years ago

Yes, this is expected behavior - the reason is, that I changed the 7-Zip property sizes of LZ4, LZ5 and Zstandard from 5 to 3 ... should I change it back to 5 ?

All new version will handle both (3 and 5 bytes header), but older versions want to have only 5 bytes... Maybe I should change it back to 5 bytes for some years, and then switch to 3.

Hermholtz commented 5 years ago

The thing is that NEW version doesn't understand archives compressed with OLD binary - so there's no backward compatibility - a breaking change :-) It means I need to keep OLD 7-zip ZS for old archives... IMO that's bad :-) When people gradually upgrade 7-zip ZS they will experience the same issue.

Unfortunately I don't know how much it impacts your code, though...

mcmilk commented 5 years ago

Ooops... I thought it was the other way around ... then this seems to be an bug, I will search for it.

Hermholtz commented 5 years ago

Anyway imagine when one company has different versions on different machines - and they're trying to unpack one archive with different versions... Maybe there's a way to keep the compatibility? Maybe you can conditionally check for something and account for different property sizes?

Just wondering. Decision is up to you :-)

mcmilk commented 5 years ago

I will change it back, so compatibility is kept.

mcmilk commented 5 years ago

This commit fixes the decompression, it was not the property size ... bb06376057526d233663fd03320b4e4d3d8dc577

Hermholtz commented 5 years ago

I was able to pull the code and build 7-zip. Your commit fixes the problem - but not entirely. It's better than before, my tpc-h archive partially decompresses, but still gives "unspecified error" at some point. I will upload the archive and share it with you tomorrow, because don't know how to prepare other repro - and now I'm on limited mobile internet (the archive is ~3GB). Thank you very much for your work so far.

image

Hermholtz commented 5 years ago

I've uploaded the archive. It's 3 GB compressed. Try to open "nation.tbl" file. https://1drv.ms/f/s!AuJRaFCgXIqdgYBo-DJq0Q-8fe7_Hg

mcmilk commented 5 years ago

You can delete the uploaded 3GB file. I have found the second bug - zIn.size was not correctly set before the first loop... :/

Fixed in 1d1e92a9fbe38ced73ce07e9b1d96b0a94c40912 ... I will create a new release within that week I think. It's also bad, that multithreaded decompression isn't possible with the new zstd API - my own zstdmt got it ;)

Hermholtz commented 5 years ago

I've built this commit and I can confirm it's okay! Thank you for the fix! :)

mcmilk commented 5 years ago

Thank you for reporting these bugs :wink:

Piqlet commented 4 years ago

It's also bad, that multithreaded decompression isn't possible with the new zstd API - my own zstdmt got it ;)

Why don't you use both codecs (zstd +zstdmt) like lzma2 and flzma2 ? There is a difference in unpacking, but I think you can solve it.

(However, it may be possible to solve it in the original, see documentation: overlaplog: 1 means "no overlap", hence completely independent jobs. Overlaplog = 1 = no overlap = mt decoding (?))

mcmilk commented 4 years ago

Yann told me, that mostly one thread for decompression is fast enough ... which normally is true ;-)