kaitai-io / kaitai_struct_formats

Kaitai Struct: library of binary file formats (.ksy)
http://formats.kaitai.io
696 stars 201 forks source link

Add 7-zip (.7z) file format #140

Open davidhicks opened 7 years ago

davidhicks commented 7 years ago

Documentation on the file format: see /DOC/7zFormat.txt within the latest SDK available for download at http://www.7-zip.org/sdk.html

Branch containing draft specification: https://github.com/davidhicks/kaitai_struct_formats/tree/7z

davidhicks commented 7 years ago

I am stuck with implementation of this file format because of the need to parse optional fields in the format.

For example, the 7z format specification has:

PackInfo
~~~~~~~~~~~~
  BYTE NID::kPackInfo  (0x06)
  UINT64 PackPos
  UINT64 NumPackStreams

  []
  BYTE NID::kSize    (0x09)
  UINT64 PackSizes[NumPackStreams]
  []

  []
  BYTE NID::kCRC      (0x0A)
  PackStreamDigests[NumPackStreams]
  []

  BYTE NID::kEnd

The [] denotes the start/end of an optional data structure in the sequence.

Therefore valid structures for PackInfo are:

  BYTE NID::kPackInfo  (0x06)
  UINT64 PackPos
  UINT64 NumPackStreams
  BYTE NID::kEnd

or:

  BYTE NID::kPackInfo  (0x06)
  UINT64 PackPos
  UINT64 NumPackStreams
  BYTE NID::kSize    (0x09)
  UINT64 PackSizes[NumPackStreams]
  BYTE NID::kEnd

or:

  BYTE NID::kPackInfo  (0x06)
  UINT64 PackPos
  UINT64 NumPackStreams
  BYTE NID::kCRC      (0x0A)
  PackStreamDigests[NumPackStreams]
  BYTE NID::kEnd

or:

  BYTE NID::kPackInfo  (0x06)
  UINT64 PackPos
  UINT64 NumPackStreams
  BYTE NID::kSize    (0x09)
  UINT64 PackSizes[NumPackStreams]
  BYTE NID::kCRC      (0x0A)
  PackStreamDigests[NumPackStreams]
  BYTE NID::kEnd

Ideally we'd have a simple way to state that an item in a sequence is optional: true. See the below proposal:

  pack_info:
    seq:
      - id: signature_byte
        contents: [0x06]
      - id: pack_position
        type: uint64
      - id: number_of_packed_streams
        type: uint64
      - id: pack_sizes
        type: pack_sizes_object
        optional: true
      - id: pack_stream_digests
        type: pack_stream_digests_object
        optional: true
      - id: end_byte_signature
        contents: [0x00]

For more information on how this may be implemented and what complications may arise, see #156.