kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.04k stars 199 forks source link

[Feature] Hide redundant wrapper types #1005

Closed Omar-Abdul-Azeez closed 1 year ago

Omar-Abdul-Azeez commented 1 year ago

The Issue

Let's say I have a wrapper for an array so that I can apply size to the whole array and get a substream:

seq:
  - id: some_list1
    type: wrapper_type(count)
    size: size_of_array
types:
  wrapper_type:
    params:
      - id: count
    seq:
      - id: some_list2
        repeat: expr
        repeat-expr: count

To access the real data I have to some_list1.some_list2.data. But this is redundant... ideally I should be able to just do some_list1.data

Suggested Implementation

seq:
  - id: some_list1
    type: wrapper_type(count).some_list2
    size: size_of_array
types:
  wrapper_type:
    params:
      - id: count
    seq:
      - id: some_list2
        repeat: expr
        repeat-expr: count

This way whenever I refer to some_list1 I actually get the some_list2 object. However with some slight modifications... currently the array some_list2 doesn't have a substream, so doing some_list2._io either doesn't work or gives me the original stream (please correct this so I can edit it in). if it doesn't work then we need some_list2._io to refer to some_list1._io instead. This should work for whatever properties we create a wrapper type for. In the case of creating a substream then the ability to access that substream shouldn't go away when we hide the wrapper object, instead the child object should inherit those.

Closely related idea that requires a discussion

Another thought that is closely related, maybe we should add a hide: false|true key for elements where we don't really require them when parsing. offsets for example shouldn't really matter when parsing date. This might also help serialization, hidden elements inside seq: are a must to write the data but the user shouldn't be able to access them as they are hidden which means the writer must be able to infer them. This way the compiler might be able to detect when hidden data can't be inferred from visible data. This way the compiler throws an error saying it can't generate a serializer. Then we can set a flag to ignore this error in the case that we don't require a serializer. Inversely, if there is visible data that isn't directly set by the user at write time, at the very least a warning might be given. maybe first let it throw an error until serialization becomes officially supported and we are sure it can infer correctly.

KOLANICH commented 1 year ago
  1. 88

  2. elements without an id are hidden
  3. about serialization: the mechanism of serialization with automatic inference of values should track which fields were modified and automatical.y detect inconsistincies.
  4. private and protected members are really a pain to deal with (if you have to modify them, there are usually 2 options, find a way to circumvent language and rely on the circumvention working and not breaking (compilers expect it working...), or fork the lib and maintain its fork, with zero chance of it to be admitted into distros) and I vote for not having such a language feature.
Omar-Abdul-Azeez commented 1 year ago
  1. 88

Seems to be what I'm suggesting but I have one question, is some_list2._io the same as some_list1._io? i.e. refer to the substream with size size_of_array? If so then that concludes this suggestion as a duplicate.

  1. elements without an id are hidden

  2. private and protected members are really a pain to deal with...

I was refering to number 2, I definitely agree we shouldn't create private and protected members as there's no point on both the KSY level and on the converted language level as you can always hack your way. Eitherway, I thought that all elements must have an id (with the exception of instances as they have a different syntax). Good to know this "feature" exists. Add that to the heap of undocumented features...

  1. the mechanism of serialization with automatic inference of values should track which fields were modified and automatical.y detect inconsistincies.

I was mainly using the hidden/visible distinction as a way to instruct the compiler at compile time that it should be able to infer hidden elements from visible elements and generate the serializer based mainly on setting visible elements by the user and inferring the rest. If it can't infer the hidden elements then it should give a warning/error instructing the user that the visible elements are not enough for a serializer and that he'll have to set hidden elements which, while possible, should be avoided.

KOLANICH commented 1 year ago

is some_list2._io the same as some_list1._io?

  1. It should be, since some_list2 has no size. Currently an _io (a KaitaiStream object) is created when one uses size.
  2. You could have checked that yourself by generating the code and examining it.

AND, BTW, if it feels too inconvenient, you, can create an instance passing some_list2 directly into _root. #88 should really shine in the cases like https://github.com/kaitai-io/kaitai_struct_formats/pull/635 , where IEEE 754-floating point numbers implemented entirely in KS can emerge in arrays.

generalmimon commented 1 year ago

Duplicate of #88