Closed codecnotsupported closed 2 years ago
yes, same problem :) I will fix it this saturday, thank you for reporting it.
EDIT: not the same problem.
The standard does require these variables to be uint32_t but anyway this is not related to the error message you are seeing. These are the size of individual zstd frames, not the whole file. There is an improvement I will make here but not related to your problem.
That error means that the header of one of the tar section has an invalid checksum. Either the tar file is corrupter or there is a bug in the checksum
or isTarHeader
function.
I'm not able to reproduce the problem. I just compressed a 500GB .tar archive of random files with -s 50K
without problems.
How was the .tar archive generated? Can you try again with -v
and send me the output? It contains file names so if it's something private send me the last ten lines replacing the names.
You can still compress the file with -r
if you want, that will not parse the tar header and compress any kind of file.
If I'm correct it's generated with tar cf tarfile.tar directory
.
Ok. Trying again, with -v
...
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/LIBRARY/images.jpg (14658)
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/ (0)
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/SymDepend.cache (125)
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/M 2 1515441116.dat (5217361)
# END OF BLOCK (5235200)
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/M 2 1516398940.dat (65868)
# END OF BLOCK (66560)
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/M 4 1516399059.dat (14674)
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/M 3 1516398986.dat (58244)
# END OF BLOCK (74240)
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/DOMDocument.xml (35973633)
# END OF BLOCK (35974144)
ERROR: Invalid tar header. If this is not a tar archive use raw mode (-r)
Tested different minimum block size. Same output, ends with:
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/DOMDocument.xml (35973633)
# END OF BLOCK (35974144)
ERROR: Invalid tar header. If this is not a tar archive use raw mode (-r)
I've a feeling it's related to file names length and/or unicode. If all else fails I'll just give you a link to the files, and see if you can reproduce. Not that it's very private. But the hosting platform may put a timeout on downloading the file because of bandwidth limitations.
I modified the code to add a verbose error message in case of checksum mismatch. Can you compile from source and try it? I want to see the wrong checksum.
Ok.
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/M 4 1516399059.dat (14674)
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/M 3 1516398986.dat (58244)
# END OF BLOCK (74240)
+ 3 - Backgrounds/901_L04_Ext_Grogar_Lair/DOMDocument.xml (35973633)
# END OF BLOCK (35974144)
ERROR: Mismatching checksum. Expected 0x0000013e but found 0x00000000.
ERROR: Invalid tar header. If this is not a tar archive use raw mode (-r)
Is DOMDocument.xml
the last file of the last block succeeded the checksum?
Or is it in the block that fails the checksum?
It's the last that succeeded.
Try tar -tvf archive.tar
. It will list all the files in the archive in the order they appear. Look at the first file that appear after DOMDocument.xml and try to create an archive with just that file. Does that work?
This may take a moment...
-rwxr-xr-x user/docker 65868 2022-07-05 11:42 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/M 2 1516398940.dat
-rwxr-xr-x user/docker 14674 2022-07-05 11:42 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/M 4 1516399059.dat
-rwxr-xr-x user/docker 58244 2022-07-05 11:42 3 - Backgrounds/901_L04_Ext_Grogar_Lair/bin/M 3 1516398986.dat
-rwxr-xr-x user/docker 35973633 2022-07-05 11:42 3 - Backgrounds/901_L04_Ext_Grogar_Lair/DOMDocument.xml
hrwxr-xr-x user/docker 0 2022-07-05 11:05 3 - Backgrounds/901_L04_Ext_Grogar_Lair/design.xfl link to 1 - Stills/Props/001_p39_spotlight_with_fireflies/001_p39_spotlight_with_fireflies.xf
-rwxr-xr-x user/docker 15018 2022-07-05 11:42 3 - Backgrounds/901_L04_Ext_Grogar_Lair/PublishSettings.xml
drwxr-xr-x user/docker 0 2022-07-05 11:42 3 - Backgrounds/901_L04_Ext_Grogar_Lair/META-INF/
-rwxr-xr-x user/docker 0 2022-07-05 11:42 3 - Backgrounds/901_L04_Ext_Grogar_Lair/META-INF/metadata.xml
I created a file directory structure and a file with the same name but nothing. I was hoping it had to do with the fact that in tar header the filename can be only 100 character long but it's handled correctly.
I'm trying but I'm running into memory issues trying to extract the archive.
For now, here's the archive (compressed with regular zstd) split into 3. cat xfl* > archive.tar.zst
to put them together.
https://drive.google.com/drive/folders/1-46h4j_6nsq0TQtXqhm4jb_DvdVKgJ3G
I'm downloading it but it will take a while
Take your time. I'm debugging with GDB & Trying to recreate the bug with a smaller archive.
I successfully reproduced the problem with your archive and reduced it to a 40MB .tar that still shows the problem. It's late at night here, I will resume tomorrow.
Was this the issue?
Just saw the fix. Yup.
Yes. The size of the tar block of DOMDocument.xml was 35973633 bytes. The previous code had a rounding error that caused an offset of 512 bytes.
The test code
size_t size = 35973633;
printf("%f\n", ceil((double)size/512.0)*512+512);
printf("%f\n", ceil((float)size/512.0)*512+512);
printf("%f\n", ceil(size/512.0)*512+512);
printf("%f\n", ceil((double)size/512.0f)*512+512);
printf("%f\n", ceil((float)size/512.0f)*512+512);
printf("%f\n", ceil(size/512.0f)*512+512);
outputs
35974656.000000
35974144.000000
35974656.000000
35974656.000000
35974144.000000
35974144.000000
I got rid of the floating point and replaced it with integer math.
https://github.com/martinellimarco/t2sz/blob/main/src/t2sz.c#L311
Should this be strtoul
instead of strol
because size_t is unsigned?
Thank you for providing the test archive and for reporting the bug. I would have never find this otherwise. Feel free to remove the link to drive, I removed the files on my side.
https://github.com/martinellimarco/t2sz/blob/main/src/t2sz.c#L311 Should this be
strtoul
instead ofstrol
because size_t is unsigned?
I think you are right but I'll check it tomorrow. Thanks again.
Thank you for providing the test archive and for reporting the bug. I would have never find this otherwise. Feel free to remove the link to drive, I removed the files on my side.
The pleasure was all mine, thanks for the fix and your time.
I was compressing a 250GB tar archive with
-s 50K
and it oddly enough it seems to have stopped at 4GB:ERROR: Invalid tar header. If this is not a tar archive use raw mode (-r)
Maybe the same problem as with https://github.com/martinellimarco/libzstd-seek/issues/3?https://github.com/martinellimarco/t2sz/blob/84dab3bbbafa04ea215d9994fee636f01e07cfd6/src/t2sz.c#L74-L75 https://github.com/martinellimarco/t2sz/blob/84dab3bbbafa04ea215d9994fee636f01e07cfd6/src/t2sz.c#L104