kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.04k stars 199 forks source link

Parsing a list of variable-size sections without explicit section-size value #992

Open nomadbyte opened 2 years ago

nomadbyte commented 2 years ago

This is rather a question than an issue as such. I'm trying to define a ksy spec to parse a binary list of sections of variable size each. Sections contain data records of fixed format and have a delimiting section-header and section-footer, yet there's no explicit size of the section nor a number of the contained records.

Here's a sample binary dump in hex:

00000000  5b 5b 5b 00 00 00 00 00  00 00 00 00 00 00 00 00  |[[[.............|
00000010  44 41 54 41 00 00 00 00  00 00 00 00 00 00 00 00  |DATA............|
00000020  5d 5d 5d 00 00 00 00 00  00 00 00 00 00 00 00 00  |]]].............|
00000030  5b 5b 5b 00 00 31 00 00  00 00 00 00 00 00 00 00  |[[[..1..........|
00000040  44 41 54 41 31 00 31 00  00 00 00 00 00 00 00 00  |DATA1.1.........|
00000050  44 41 54 41 32 00 31 00  00 00 00 00 00 00 00 00  |DATA2.1.........|
00000060  44 41 54 41 33 00 31 00  00 00 00 00 00 00 00 00  |DATA3.1.........|
00000070  5d 5d 5d 00 00 31 00 00  00 00 00 00 00 00 00 00  |]]]..1..........|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The sections are delimited by [[[ and ]]] header/footer records and each section has its id field (0x0 and 0x31 in this case). In this sample all records are of the same size: 16 bytes. The whole file size is fixed. The sections are followed by some unused 0-filled records of the same size.

I was able to parse such format into one array of records of varying type (header, footer, data), like the following:

rec[0]:{type:header,
  header:{id: 0}
},
rec[1]:{type:data,
  data:{DATA, secid:0}
},
rec[2]:{type:footer,
  footer:{id: 0}
},
...
rec[8]:{type:unused}

I wonder if such format could be defined in ksy so that it parses into a list of sections, something like:

section[0]:{id:0,
  data:{DATA, secid:0}
},
section[1]:{id:0x31,
  data:{DATA1, secid:0x31},
  data:{DATA2, secid:0x31},
  data:{DATA3, secid:0x31}
},
...
unused:{
}

Is it possible to parse this into such a form with ksy?

GreyCat commented 2 years ago

@nomadbyte Looks like right now it won't be straightforward to implement using functionality currently available in KS.

Probably closest among the current proposals would be #538 — this speaks of a way to define sections using scanning of data stream for certain patterns and then one could have substreams based on these scans, e.g. something like:

seq:
  - id: recs
    scan-start: kaitai.bytes([0x5b, 0x5b, 0x5b])
    scan-end: kaitai.bytes([0x5d, 0x5d, 0x5d])
    type: record
    repeat: eos