kubo / snzip

Snzip, a compression/decompression tool based on snappy
Other
216 stars 30 forks source link

Support for concatenated snappy-in-java files #23

Open gwittel opened 6 years ago

gwittel commented 6 years ago

When dealing with some legacy format files, I noticed that snzip will fail to read snappy-in-java format files that are concatenated together. The issue is when it encounters the 2nd file, it reads the 's' (0x73) from the header and aborts since its not a recognized id.

The simple workaround is to skip the next 6 bytes (nappy\0) similar to how the framing2 format implicitly skips the header (this is due to it reading 0xff 0x06 0x00 0x00 as 6, then skipping those 6 bytes (sNaPpY) with the fseek.

Before I sent a real PR I wanted to get some feedback. My quick and dirty workaround does not validate the 2nd header is actually a valid snappy header. However, framing2 doesn't do this either (it relies on the implicit skipping defined by the header format itself).

Creating test file:

$ echo 'hello' | ./snzip -t snappy-in-java > one.snappy
$ echo 'world' | ./snzip -t snappy-in-java > two.snappy
$ cat one.snappy two.snappy > three.snappy

Original version:

$ ./snzip -d -c three.snappy
hello
Unknown compressed flag 0x73

Patched:

$ ./snzip -d -c three.snappy
hello
world

Thoughts/preferences on patch approach?

Hacky version diff:

diff --git a/snappy-in-java-format.c b/snappy-in-java-format.c
index 0f95e1a..2b2579a 100644
--- a/snappy-in-java-format.c
+++ b/snappy-in-java-format.c
@@ -195,6 +195,16 @@ static int snappy_in_java_uncompress(FILE *infp, FILE *outfp, int skip_magic)
     case UNCOMPRESSED_FLAG:
       /* pass */
       break;
+       case 's':
+         /* s== 0x73 Possible concatenated block.
+          * Note that other framing formats like frame2 see 0xff and just skip
+          * the rest of the header due to the header being: 0xff 0x06 0x00 0x00 snappy
+          * (it reads the 3-byte chunk header length resulting in a block length of
+          * 6 bytes, and skips 6 bytes which happens to be == snappy)
+          */
+         /* Likely concatenated snappy file.  We read first byte, skip rest */
+         fseek(infp, SNAPPY_IN_JAVA_MAGIC_LEN - 1, SEEK_CUR); /* TODO strict check? */
+         continue;
     default:
       print_error("Unknown compressed flag 0x%02x\n", compressed_flag);
       goto cleanup;
kubo commented 6 years ago

Thanks for opening the issue and sorry not to reply you for long time. If you have will yet, could you make a pull request?

Could you validate file headers? That's because original implementation does. (here) Could you fix indentation width also? This file uses two spaces for indentation.