kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
3.98k stars 194 forks source link

Can Kaitai DSL describe an exception to be thrown on unexpected input? #650

Closed lyubomyr-shaydariv closed 4 years ago

lyubomyr-shaydariv commented 4 years ago

Hi,

I've recently came across Kaitai and I'm evaluating it so that I could get rid of manual parsing. My custom binary format is extremely simple:

meta:
  id: crypto_container_parser
  endian: le
seq:
  - id: magic
    type: magic
  - id: meta
    type: meta
  - id: reverse_magic
    type: reverse_magic
types:
  magic:
    seq:
      - id: magic
        contents: 'CRPT'
  reverse_magic:
    seq:
      - id: magic
        contents: 'TPRC'
  meta:
    seq:
      - id: algorithm
        type: algorithm
      - id: key
        type: key
      - id: parameters
        type: parameters
  algorithm:
    seq:
      - id: self
        type: none_or_byte_array_1
  key:
    seq:
      - id: self
        type: none_or_byte_array_2
  parameters:
    seq:
      - id: self
        type: none_or_byte_array_2
  none_or_byte_array_1:
    seq:
      - id: tag
        type: u1
        enum: type_tag
      - id: optional
        type:
          switch-on: tag
          cases:
            'type_tag::none': none
            'type_tag::byte_array_1': byte_array_1
  none_or_byte_array_2:
    seq:
      - id: tag
        type: u1
        enum: type_tag
      - id: optional
        type:
          switch-on: tag
          cases:
            'type_tag::none': none
            'type_tag::byte_array_2': byte_array_2
  none:
    seq: []
  byte_array_1:
    seq:
      - id: length
        type: u1
      - id: elements
        size: length
  byte_array_2:
    seq:
      - id: length
        type: u2le
      - id: elements
        size: length
enums:
  type_tag:
    0: none
    1: byte_array_1
    2: byte_array_2

Byte arrays above require simple type tags 0x00 for null values, 0x01 for 1-byte-length arrays, and 0x02 for 2-byte-length arrays. In short, this grammar can process such data:

'C' 'R' 'P' 'T'
'\x01' '\x03' 'f' 'o' 'o'
'\x00'
'T' 'P' 'R' 'C'

Now suppose the type tag is broken for whatever reason and indicates a type that is not registered in the type_tag enum above.

'C' 'R' 'P' 'T'
'\xDD' '\x03' 'f' 'o' 'o'
'\x00'
'T' 'P' 'R' 'C'

Let's say, the actual type tag value is 0xDD (and it doesn't map any type). Trying to parse such data the following exception is thrown:

java.lang.NullPointerException
    at CryptoContainerParser$NoneOrByteArray1._read(CryptoContainerParser.java:166)
    ...

which points to:

this.tag = CryptoContainerParser.TypeTag.byId(this._io.readU1());
switch (tag()) {

The NPE encounters in the switch expression caused by public static TypeTag byId(long id) { return byId.get(id); } that returns null for the unrecognized type tag 0xDD. If I'm not mistaken at understanding Scala code, currently the compiler does not provide any way of throwing an "unrecognized" exception in JavaCompiler.scala. My question is: is there a way to tell the generated code to throw a custom exception on encountering illegal type tags? I would assume that I could add a new case for _ to the switch-on clauses above, but I didn't find anything in the documentation telling how to throw an exception to hide the obscure NullPointerException.

GreyCat commented 4 years ago

The error you're hitting is likely the same as described in #568. It's Java-specific, most other languages won't encounter this problem.

For the original ask, you'd probably be interested in stuff implemented in #435 — this would allow you to specify certain assertions/constraints that your data must adhere to.

lyubomyr-shaydariv commented 4 years ago

@GreyCat Thank you for the quick reply! The linked issues shed some light, and just to make it clear to me since I'm really new to Kaitai:

Is my understanding correct?

GreyCat commented 4 years ago

Currently, the latest v0.8, does not provide a way to assert explicitly, but it's able to throw parsing exceptions that are inferred by other rules.

In 0.8, there is no validation functionality at all (and no "other rules" really), so all you can get from 0.8.

0.8 is not the absolute latest, it's latest stable release. 0.9+ unstable offers at least some of that functionality.

Assertion and validation is going to be implemented in a more "deep" way since "just throw an exception" as I supposed it to be is probably an insufficient way of doing this.

Can you clarify what exactly do you mean by "deep" way? The current idea of asserting/validating is exactly about throwing an exception, although with clear indication of which part of specification fails validation (i.e. not just some confusing NullPointerException in the middle of nowhere).

Java-generated codegen will return null values as it does now, and the null value is expected to be handled at a higher level, as I understood it in #435.

That is correct. I don't actually see much problems with Java code using null for unresolved enums, it seems to be relatively straightforward and widely accepted way to use enums. The only problem is specifically with switch implementation in Java. No other language is affected by this.

lyubomyr-shaydariv commented 4 years ago

0.8 is not the absolute latest, it's latest stable release. 0.9+ unstable offers at least some of that functionality.

Yup, I just picked the latest artifact from the Maven Central, and I'll probably check the 0.9 version later.

Can you clarify what exactly do you mean by "deep" way?

I wanted to say that I expected it to fail somewhere at a higher level because of a missing mapping or unexpected enum value explaining the reason why, not necessarily just because it gets a null from the generated hash map.

Thanks!