Closed zxybazh closed 2 months ago
Could post more information?
It works for me.
$ ./snzip -t raw INSTALL
$ ./snzip -t raw -d INSTALL.raw
My environment is: OS: Linux (Ubuntu 16.04 x86_64) Test data: INSTALL
Hi, I did the test on Ubuntu 16.04, CPU Intel(R) Core(TM) i7-7700. Test data is right here, part of a TPCH dataset. Please check, thanks!
Thanks. The compressed file is incorrectly compressed because of too big data. The maximum size of raw uncompressed data is 4G according to this information.
There are two choices.
snzip -t raw
fail when the file size is over 4G.Got it, thanks.
- Make snzip -t raw fail when the file size is over 4G.
- Split file data by 4G and create a compressed file containing concatenated compressed split data.
The latter is impossible. I can create a file containing concatenated raw compressed data. However I cannot decompress it because snappy checks whether all input data are consumed or not by decompressor->eof()
. When two raw compressed data are concatenated, there is no way to know the boundary.
I believe we have to make a new file format to store the file length information for splits of raw compressed data over 4G in case we can split them again when decompressing.
What merit does the new file format have? I won't reinvent the wheel unless it has explicit merit.
Well, you're right. Let's not reinvent the wheel. It's just that I want to make sure that we can get the boundary for every split when we want to decompress the file. If there is something already there, it would be even better. For now, you may just make it fail when file size is over 4G.
I compressed a raw file with
snzip -t raw file
and when I runsnzip -t raw -d file.raw
I got the error message of uncompress failed.