kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.04k stars 199 forks source link

The `if` expression in value instances is not type-checked #1129

Open Mingun opened 2 months ago

Mingun commented 2 months ago

Currently I only check <type>.instances.<instance>.if, but I have feeling that the same problem exists for other fields where expression is allowed.

Below the comparison between parse instance and value instance. There are many other problems with variables, but in this issue lets focus on missing checks of expression type:

Legend for the OK? column:

<type>.instances.<instance>.if (parse instance)

Parsed by ParseUtils.getOptValueExpression.

Test KSY:

meta:
  id: test_type
  endian:
    switch-on: 1
    cases:
      1: be
      2: le
params:
  - id: param
    type: u1
seq:
  - id: seq
    type: u1
instances:
  me:
    size: 1
    if: {} # Substitution place for the variable
  instance:
    value: 1
OK? Variable Current status Comment
:white_check_mark: param[^5] Allowed
:question: seq[^2] Allowed If seq depends from me, then should be denied (currently it is allowed and we get error in runtime)
:question: instance[^3] Allowed Analogous to seq
:x: me[^6] Allowed Check is performed before the field will be read. Access to the self getter is generated which leads to infinity recursion
:white_check_mark: unknown[^4] Denied Error reported: unable to access 'unknown' in <type name> context. In is better to enclose <type name> in quotes
:question: _ Denied Error reported: java.util.NoSuchElementException: None.get
:white_check_mark: _buf Denied Error reported: invalid ID: '\_buf', expected /^[a-z][a-z0-9_]*$/
:x: _index Allowed Access to the undefined loop variable is generated (i in JS). Must be denied, because condition is checked before the loop, so even with repeat using this variable has no sense
:white_check_mark: _io Allowed
:question: _is_le Denied Error reported: invalid ID: '\_is_le', expected /^[a-z][a-z0-9_]*$/. If <type> supports variable byte order, should be allowed
:question: _on Denied Error reported: java.util.NoSuchElementException: None.get
:white_check_mark: _parent Allowed
:white_check_mark: _root Allowed
:white_check_mark: _sizeof Allowed Expanded to the compile-time constant of the <type>'s size

<type>.instances.<instance>.if (value instance)

Parsed by ParseUtils.getOptValueExpression.

Test KSY:

meta:
  id: test_type
  endian:
    switch-on: 1
    cases:
      1: be
      2: le
params:
  - id: param
    type: u1
seq:
  - id: seq
    type: u1
instances:
  me:
    value: 1
    if: {} # Substitution place for the variable
  instance:
    value: 1
OK? Variable Current status Comment
:question: param[^5] Allowed Expression type is not checked
:question: seq[^2] Allowed Expression type is not checked. Field dependencies is not checked. Should be denied if seq depends of me
:question: instance[^3] Allowed Expression type is not checked. Field dependencies is not checked. Should be denied if instance depends of me
:x: me[^6] Allowed Check is performed before the field will be read. Access to the self getter is generated which leads to infinity recursion
:x: unknown[^4] Allowed Access to the undefined field is generated (this.unknown in JS)
:x: _ Allowed Access to the undefined local variable is generated (_ in JS)
:x: _buf Allowed Access to the undefined field is generated (this._buf in JS)
:x: _index Allowed Access to the undefined loop variable is generated (i in JS). Must be denied, because condition is checked before the loop, so even with repeat using this variable has no sense
:question: _io Allowed Expression type is not checked
:x: _is_le Allowed Access to the undefined field is generated (this._isLe in JS). If <type> supports variable byte order, should be translated to check of the right field (this._is_le in JS)
:x: _on Allowed Access to the undefined local variable is generated (on in JS)
:question: _parent Allowed Expression type is not checked
:question: _root Allowed Expression type is not checked
:question: _sizeof Allowed Expression type is not checked. Expanded to the compile-time constant of the <type>'s size

[^2]: attribute of the enclosing type in seq section [^3]: instance of the enclosing type in instances section [^4]: non-existent attribute or instance in the enclosing type [^5]: parameter of the enclosing type [^6]: reference to self

generalmimon commented 2 months ago

To be honest, I don't see much value in these elaborate tables. In summary, type checking:

So the second "<type>.instances.<instance>.if (value instance)" table can be completely disregarded, because this is not how things usually work and there's no point in overanalyzing it. The only takeaway is that type checks are not run there. All it takes to make it look like exactly like the first table is to run the forgotten type checking on the if key of value instances as well (which is literally one line of code in KSC).

Other remarks in the first table are just repeating what has been reported before in other issues:

seq Allowed If seq depends from me, then should be denied (currently it is allowed and we get error in runtime)
me Allowed Check is performed before the field will be read. Access to the self getter is generated which leads to infinity recursion

See https://github.com/kaitai-io/kaitai_struct/issues/685

_, _index, _is_le, _on

See https://github.com/kaitai-io/kaitai_struct/issues/404


I don't really see anything new in what you wrote, except for the simple fact that "<type>.instances.<instance>.if (value instance)" is missing type checks (which seems to be only demonstrated by the expr_field_unknown_if_inst_value test that I added just 2 days ago, so I don't blame you for not noticing and thanks for reporting it anyway).

I thought you would already know about the 2 existing issues because you've encountered them in the past:

generalmimon commented 2 months ago

Or let's clearly indicate that this issue only reports missing type checks on if in value instances so that it can be closed once the expr_field_unknown_if_inst_value test passes.

Mingun commented 2 months ago

Or let's clearly indicate that this issue only reports missing type checks on if in value instances so that it can be closed once the expr_field_unknown_if_inst_value test passes.

Yes, the intention was just about this missing check. Those tables just an extract from my work-in-progress of documenting in which contexts each special variable:

That is why it contains more information than needed just for this issue.

The idea that with each property with expression an unique context should be assigned which determines which special variables can be used and how they should be translated. To achive that I firstly want to create test cases, check how things work currently and establish a correct behavior, because it seems that this work was never done before. Some issues created here and there but without overall picture.

generalmimon commented 2 months ago

Yes, the intention was just about this missing check. Those tables just an extract from my work-in-progress of documenting

If you want to report one bug, open a simple issue reporting that one bug. If you want to create a far-reaching issue that attempts to solve world hunger, please do so in another issue.

Some issues created here and there but without overall picture.

Please don't tell me that you knew about the existing issues but didn't even bother to look them up and refer to them.

Mingun commented 2 months ago

Huh?

Below the comparison between parse instance and value instance. There are many other problems with variables, but in this issue lets focus on missing checks of expression type:

following by the reproduction case and description of each variant.

Please don't tell me that you knew about the existing issues but didn't even bother to look them up and refer to them.

No, I don't. Thanks that you found related issues. The most obvious "type check" request found nothing similar to this issue.

Mingun commented 2 months ago

Actually, if in value instances should be prohibited. I do not see any real usage for it. For the

meta:
  id: issue1129
instances:
  me:
    value: 1
    if: _

...the following JS code is generated:

  Object.defineProperty(TestType.prototype, 'me', {
    get: function() {
      if (this._m_me !== undefined)
        return this._m_me;
      if (_) {
        this._debug._m_me = {  };
        this._m_me = 1;
      }
      return this._m_me;
    }
  });

As it can be obviously seen, the if here is useless. It is allowed by this code https://github.com/kaitai-io/kaitai_struct_compiler/blob/8610e1d7a796aad7addb64a5be2f9da48289bffc/shared/src/main/scala/io/kaitai/struct/format/InstanceSpec.scala#L42-L49 that was added in https://github.com/kaitai-io/kaitai_struct_compiler/commit/67e5a3cc477b13c8204446544721c2538f0dc53d

and if expressions was allowed in https://github.com/kaitai-io/kaitai_struct_compiler/commit/d6b4fbaac71f2b5d8436095a67317c5db9462f27

@GreyCat, why you added if to value instances?

generalmimon commented 2 months ago

@Mingun:

Actually, if in value instances should be prohibited.

No, it shouldn't. This sounds like you don't have much experience writing or perhaps also reading .ksy files (I recommend doing it more, I've always enjoyed it). Ideally, if a .ksy specification is written properly and you're parsing a binary file that follows the format that the .ksy spec is for, you should be able to invoke all value instances and not get any errors or values that are meaningless because they're not applicable in the particular situation (so not only would they be calculated unnecessarily, but mainly their values ​​could be misleading). Such errors or meaningless values would just confuse the user that inspects the object tree in visualizers, and especially nonsensical values would encourage misinterpretations when a user processes the Kaitai Struct object in their application.

What do I mean by "errors"? Using if in a value instance allows you to avoid "null pointer exceptions" when you want to use a conditional field in the value expression, e.g.:

seq:
  - id: has_foo
    type: b1
  - id: foo
    type: b7
    if: has_foo
instances:
  foo_as_signed:
    value: (foo ^ 0x40) - 0x40
    if: has_foo
    doc-ref: https://graphics.stanford.edu/~seander/bithacks.html#VariableSignExtend

Thanks to the if: has_foo on foo_as_signed, even if has_foo is false and we invoke the foo_as_signed instance, we won't get a NPE. This makes sense: foo_as_signed is derived from foo, so if foo is null, foo_as_signed should be null as well.

But it's not only about NPEs. Sometimes you want to clearly mark the instance as "not applicable" even if a value could be calculated without errors. As one example of many, take a look at zip.ksy:129-145. The specification would work even without these ifs, but then the user would have to be extra careful which values are actually meaningful and which are nonsensical - it would certainly be confusing.