h2non / filetype

Fast, dependency-free Go package to infer binary file types based on the magic numbers header signature
https://pkg.go.dev/github.com/h2non/filetype?tab=doc
MIT License
2.05k stars 178 forks source link

Enhance Zstd support #100

Closed bkda closed 3 years ago

bkda commented 3 years ago

Zstandard compressed data is made of one or more frames. There are two frame formats defined by Zstandard: Zstandard frames and Skippable frames.

See more details from https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-00.html#rfc.section.2

The structure of a single Zstandard frame is as follows, the magic number of Zstandard frame is 0xFD2FB528

  +--------------------+------------+
  |    Magic_Number    | 4 bytes    |
  +--------------------+------------+
  |    Frame_Header    | 2-14 bytes |
  +--------------------+------------+
  |     Data_Block     | n bytes    |
  +--------------------+------------+
  | [More Data Blocks] |            |
  +--------------------+------------+
  | [Content Checksum] | 0-4 bytes  |
  +--------------------+------------+

Skippable Frames

  +--------------+------------+-----------+
  | Magic_Number | Frame_Size | User_Data |
  +--------------+------------+-----------+
  |    4 bytes   |   4 bytes  |  n bytes  |
  +--------------+------------+-----------+

Magic_Number: 0x184D2A5?, which means any value from 0x184D2A50 to 0x184D2A5F.
Frame_Size: This is the size `n` of the following UserData, 4 bytes, little-endian format, unsigned 32-bits.

This library can't deal with zstd file with skippable frame, this PR will fix this issue. For example:

image

In this situation, in front of the magic number of Zstandard frame 0xFD2FB528, there is a Skippable frame with a magic number 0x184D2A50, so we should parse the Skippable frame, skip the user data, and then check the magic number 0xFD2FB528.

By the way, I can't find an elegant way to write another test for zstd, so I just wrote a test under the for loop.

h2non commented 3 years ago

Thank you!

By the way, I can't find an elegant way to write another test for zstd, so I just wrote a test under the for loop.

Ideally, you can simply replace the existing sample.zst with your fixture file, that way the existing for loop will cover it too.

bkda commented 3 years ago

Thank you!

By the way, I can't find an elegant way to write another test for zstd, so I just wrote a test under the for loop.

Ideally, you can simply replace the existing sample.zst with your fixture file, that way the existing for loop will cover it too.

I want to test both of them, so I added a new sample file.

cfergeau commented 3 years ago

Looks good to me, thanks for the multiple iterations :)