kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.02k stars 197 forks source link

How to get the size of the read record and its source buffer for crc validation? #1046

Open tonal opened 1 year ago

tonal commented 1 year ago

How to get the size of the read record and its source buffer for crc validation?

For Example:

types:
    rec_type_pack:
    seq:
    - id: mark
      contents: [0xc0]
    - id: type_id
      type: u1
      enum: ids_rec_type
    - id: body
      type:
        switch-on: type_id
        cases:
          'ids_rec_type::id_state_power': # 0x07:
            rec_state_power
          'ids_rec_type::id_all_settings': # 0x12:
            rec_all_settings

    - id: crc16
      type: u2
      doc: GetCRC16(this.currentData, sizeRecord - 2)
with KaitaiStream(open(filename, 'rb')) as _io:
  rec = RecTypePack(_io)
  rec._read()
  if rec.crc16 != crc16(rec._raw_buffer[:-2]): # How to get _raw_buffer?
    print("invalid record:", rec)
GreyCat commented 1 year ago

In the ideal world, you'd want your packet (which you'll be calculating CRC for) in a separate user type, with it's own substream backing it, for example:

rec_type_pack_and_crc:
  seq:
    - id: pack
      type: rec_type_pack
      size: 123 # or somehow limit the stream otherwise
    - id: crc16
      type: u2

Then in your app code, you could access _raw_pack or _pack._io or something along these lines.

However, as you've pointed out this needs to designate that substream (with size or term) beforehand, and you don't see to have that knowledge at that stage.

Alternative approach might be (old school) using _io.pos memorization and arithmetics:

types:
  rec_type_pack:
    seq:
      - id: mark
        contents: [0xc0]
        if: save_pos1 != -1 # always true, so doesn't really make `mark` conditional
      - id: type_id
        type: u1
        enum: ids_rec_type
      - id: body
        type:
          switch-on: type_id
          cases:
            'ids_rec_type::id_state_power': # 0x07:
              rec_state_power
            'ids_rec_type::id_all_settings': # 0x12:
              rec_all_settings
      - id: crc16
        type: u2
        if: save_pos2 != -1 # always true, so doesn't really make `crc16` conditional
    instances:
      save_pos1:
        value: _io.pos
      save_pos2:
        value: _io.pos

This way you can ensure that save_pos1 and save_pos2 are invoked at the right time in your parser, and this will keep positions in the stream to beginning of your packet and end of it. Length of your packet will be save_pos2 - save_pos1. You can read it raw again by doing something like:

len = rec.save_pos2 - rec.save_pos1
rec._io.seek(rec.save_pos1)
raw_pack = rec._io.read(len)

In future, we plan to allow built-in operations to track these sizes/addresses (see #84), and also we plan to have checksumming built-in (see #81).