kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
3.96k stars 192 forks source link

Is there a way to get array substream? #679

Open ghost opened 4 years ago

ghost commented 4 years ago

File contains some count of fixed size blocks Data entry may consist of 1..n block, last block marked with magic signature

The objective is: step 1: Use array & repeat-until to locate data entries step 2: Create instance and if necessary, substream to handle variable size data

Is it possible to get array io object? Or is there a way to handle structure other way (mb pos?)

seq:
  - id: entries
    type: entry_t
    repeat: eos

types:
  entry_t:
    seq:
      - id: blocks
        type: block_simplified
        size: 1024
        repeat: until
        repeat-until: _.sign == 'EndMarker____'

    instances:
      data_processed:
        io: blocks._io   # if it would be possible to get io of all array this will work
        type: entry_ext_t
        size: 1024 * blocks.size # specify size to create substream

  block_simplified: # simple block description just to find end marker
    seq:
      - id: filler
        size: 1024 - (13)
      - id: sign
        type: str
        size: 13

  entry_ext_t: # ext blocks description to handle data
    seq:
   # ....... struct fields - data will be handled here
generalmimon commented 4 years ago

A substream can be created only on a single item of a user-defined type. If you add a repetition to some field (which yields an array), the size will still apply to the individual items and never on the whole array.

But you can create a substream only if you know its final size, as you can't resize it later. A resizable substream would kind of lose the main reason for creating a substream, i.e. limiting some structure in size when you know that size in advance.

You don't need to do that, so you don't need a substream. You just need to memoize the offset where the blocks starts in the stream. I suggest using the fact that the value instances are lazy and memoized after the evaluation. So you create a value instance, let's call it blocks_offs and invoke it just before parsing the blocks. Something like this:

types:
  entry_t:
    seq:
      - id: invoke_blocks_offs
        size: 0
        if: blocks_offs >= 0 # it doesn't really matter what this evaluates to, we just need to invoke our instance to remember the current stream position
      - id: blocks
        type: block_simplified
        size: 1024
        repeat: until
        repeat-until: _.sign == 'EndMarker____'
    instances:
      blocks_offs:
        value: _io.pos
      data_processed:
        pos: blocks_offs
        type: entry_ext_t
        size: 1024 * blocks.size # specify size to create substream
KOLANICH commented 4 years ago

You may want to look at https://github.com/KOLANICH/kaitai_struct_formats/blob/qsp/game/qsp.ksy#L35L75