kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.03k stars 198 forks source link

Support iteration in value instances #420

Open KOLANICH opened 6 years ago

KOLANICH commented 6 years ago
meta:
  id: test
  file-extension: test
seq:
  - id: bytes
    type: u1
    repeat: expr
    repeat-expr: 10
instances:
  sums:
    value: '_index == 0 ? bytes[0] : sums[_index-1] + bytes[index]'
    repeat: expr
    repeat-expr: 4

invalid key found in value instance, allowed: doc, doc-ref, enum, if, value

GreyCat commented 6 years ago

Unlikely to happen. There are a lot of questions about this one and, to be frank, I don't see any use cases so far.

KOLANICH commented 6 years ago

There are a lot of questions about this one

Could you name them?

I don't see any use cases so far.

Yes, it can be replaced by the tricks in other examples, I just dislike the hack with pos: 0 size: 0 because it causes io operations to be compiled and obfuscates the meaning.

natronium commented 6 years ago

I don't see any use cases so far.

The vlq types seem like an excellent example of a use-case for something like this. Currently, that implementation only "supports serialized values to up 8 bytes long", which doesn't sound especially variable.

GreyCat commented 6 years ago

If something can be implemented using recursion, it doesn't mean that it has to be implemented using recursion. For something simple like summing numbers, we'd much better off with a generic sum or fold or inject operation implementation.

dgelessus commented 4 years ago

Another (perhaps simpler) use case for this feature would be performing some sort of calculation/transformation on all elements of an array (basically like map in functional languages). This doesn't involve any recursion (just regular iteration), and as far as I can tell there's no way to express this using existing KSY features. Simple example:

seq:
  - id: num_addresses
    type: u2
  - id: addresses_raw
    type: variable_length_integer
    repeat: expr
    repeat-expr: num_addresses
instances:
  addresses:
    value: addresses_raw[_index].value
    repeat: expr
    repeat-expr: addresses_raw.length

My actual use case for this is a little more complex. In the format I'm speccing, the addresses_raw array can consist either of u1s or variable_length_integers, depending on the value of another field. This means that to access a value stored in addresses_raw, I need to use either addresses_raw[i] or addresses_raw[i].value, depending on the array's element type. I would like to handle this in a value instance, so that I can always use addresses[i] (in my KSY and application code), regardless of what the array's underlying type is.

KOLANICH commented 4 years ago

In CBOR format I have used an array + induction. https://github.com/KOLANICH/kaitai_struct_formats/blob/cbor/serialization/cbor.ksy#L308L326

dgelessus commented 4 years ago

Just for information: #700 is somewhat related to this, and the suggested solution there (perform the calculation in the array element type, rather than on the entire array afterwards) can be used to work around this issue sometimes.