vasi / pixz

Parallel, indexed xz compressor
BSD 2-Clause "Simplified" License
711 stars 61 forks source link

Error decoding stream footer when trying to decompress a 3.1 TiB .tpxz file #102

Open alpha754293 opened 2 years ago

alpha754293 commented 2 years ago

I am trying to decompress a 3.1 TiB .tpxz file that was compressed using 1.0.6 and the error message that I am getting is:

"Error decoding stream footer"

What can/should I do as a workaround for this issue so that I can try and get my data back?

Your help is greatly appreciated.

Thank you.

vasi commented 2 years ago

pixz files are also valid xz files, so you can try using xz to decode and see if that helps

alpha754293 commented 2 years ago

Thank you.

Does the size of the file make any difference?

I'm not a programmer, so I don't really know nor understand code, but I did see that in the .c file where it says to print this error message, it uses the data type "uint32". Does the 32 bit unsigned integer (I'm just getting what a uint32 is) make a difference? Would it help if it were to be changed to a uint64 so that it would be handle really large files like this or not really?

(Again, full and fair disclosure: I'm not a programmer so I can be completely wrong in what I think the code is saying leading up to where it prints this error message.)

I'm trying to test my archive now with the command xz -tv "file name with spaces.tpxz", so hopefully that will tell me something about the .tpxz file that pixz produced.

If it fails, what should I look forward to or expect as the way to fix/resolve this or way to get my data back?

Your help is greatly appreciated.

vasi commented 2 years ago

pixz definitely works on large files, I use it that way all the time. The uint32_t in the code there refers to part of the structure of an xz file, it's different.

I honestly have no idea what happened to your file, it sounds like it might be truncated/corrupted. Maybe it was interrupted partway through creation?

Anyhow, pixz/xz are streaming formats, so you technically don't need a footer at all. You could try doing streaming decoding like cat myfile | pixz -d or cat myfile | xz -cd, and seeing if that produces something reasonable.

alpha754293 commented 2 years ago

Thank you for your help and support.

My system has about approximately 20 hours to go before xz is done testing the file.

For the command: cat myfile | xz -cd, do I need to specify an output filename when it decompresses the file and sends it to stdout?

If/when you are working on multi-TiB .tpxz files like this, is there anything that I should do "special" to ensure a greater probability of success or reduce the chance/risk of something like this from occurring again in the future?

"I honestly have no idea what happened to your file, it sounds like it might be truncated/corrupted." Do I or should I always run xz -tv against the resulting compressed archive that's produced by pixz to ensure that the compression and creation of the .tpxz file didn't get corrupted somewhere along the way?

"Maybe it was interrupted partway through creation?" Not to the best of my knowledge.

It was created using the command: `tar -I pixz -cvf "file name with spaces.tpxz" "path to files/"

There weren't any errors that were reported when the file was being created.

Thank you.

Your help is greatly appreciated.

vasi commented 2 years ago

Sure, if it's a tar-file then try cat myfile | xz -dd | tar -t to see a list of contents. If that works, you can do cat myfile | xz -cd | tar -x to extract it.

There's nothing special you should have to do when working on large files.

alpha754293 commented 2 years ago

Thank you.

Your help is greatly appreciated.

I'll try that.

(It's going to take a while due to the size of my file that I have to test and/or retest.)

alpha754293 commented 2 years ago

@vasi Sorry, can I ask you an unrelated, stupid question?

In the examples, it says: tar -I pixz -cvf filename.tpxz /path/to/files/

How do I use a higher level of compression with that command?

Do I have to separate the creation of the tarball file from the compression step if I want to use a higher level of compression than the default?

Your help is greatly appreciated.

Thank you.

alpha754293 commented 2 years ago

So at least one of the two archives has already reported back, after running xz -tv "Compressed data is corrupt."

(The second file said: "Unexpected end of input".)

Given that, how would pixz know that when it was creating the compressed file?

(I'm not sure how the compressed data could have been corrupt, but more importantly at this stage: 1) Trying to get the data back, whatever data that can be salvaged, and 2) how to prevent this from occurring in the future.)

Your help is greatly appreciated.

Thanks.