kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.04k stars 199 forks source link

Difficulty computing the size when only the end offset has been given #376

Open hakanai opened 6 years ago

hakanai commented 6 years ago

I'm dealing with a bit of an odd structure (FBX format again) where the structure has only stored the end offset instead of the size:

seq:
  - id: header
    type: header
  - id: records
    type: node_record
    repeat: until
    repeat-until: _.end_offset == 0

types:

  node_record:
    seq:
      - id: end_offset
        type: u4
      - id: property_count
        type: u4
      - id: property_list_length
        type: u4
      - id: name_length
        type: u1
      - id: name
        type: str
        size: name_length
      - id: properties
        type: property
        repeat: expr
        repeat-expr: property_count
        size: property_list_length
      - id: children
        size: end_offset - _root._io.pos
        type: node_record
        repeat: until
        repeat-until: _.end_offset == 0
        if: not _io.eof

The code above represents what I've tried to do. It seems like end_offset - _root._io.pos should be the correct size for the total list of child nodes plus the 13-byte zero terminator that follows them. The 13-byte terminator is essentially a node with no name, properties or children, so I'm terminating that on _.end_offset == 0 as a weak check.

But I get an error:

Call stack: undefined RangeError: Invalid typed array length: -1795

without any context as to where it's having the problem. I do know that it isn't in the property parsing area, because I can comment out the type/repeat for the properties and get the same error. So it's something about parsing nodes, but I'm unfortunately not being given enough context in the error to figure out why it doesn't work.

I dumped the full code here in case it's more useful than my fairly small extract here.

GreyCat commented 6 years ago

I still haven't fully realized what's the problem exactly, but several ideas I had so far:

      - id: properties
        type: property
        repeat: expr
        repeat-expr: property_count
        size: property_list_length

That might not do what you imply here. size: in this case sets the size of each individual property in this list, not whole list.

Call stack: undefined RangeError: Invalid typed array length: -1795

Given that the only repeat: expr in your code is the fragment above, it probably comes from property_count being negative. You can try doing something akin to:

        repeat: expr
        repeat-expr: 'property_count >= 0 ? property_count : 0'

to mitigate the issue, let it be parsed and see what's happening there. Judging from the error message, you're using Web IDE, so may be that's Web IDE issue as well? Cc @koczkatamas?

koczkatamas commented 6 years ago

I think the best method to figure out issues like this is to debug the code.

@trejkaz could you open your browser console (F12 or Ctrl+Shift+I on Windows or Cmd+Shift+I on macOS), select the JS code (debug) tab and put the following line:

if (this.endOffset < this._root._io.pos) debugger;

above this line:

var _buf = this._io.readBytes((this.endOffset - this._root._io.pos));

and press Ctrl+Enter in the JS code (debug) editor?

This should break at the point where the parsing fails.

Here is a short video what should happen / how should it look: https://github.com/kaitai-io/kaitai_struct_webide/wiki/Features#debugging

hakanai commented 6 years ago

property_count is a u4 so I'd hope it isn't negative. :)

I'll check it out in the debugger the next time I'm back at the computer with the data I'm parsing.

hakanai commented 6 years ago

In the debugger:

So essentially it became negative because _root._io.pos is far higher than the positions we have read in the file so far.

Manual breakdown:

header:
  0x00 : magic 4b 61 79 64 61 72 61 20 46 42 58 20 42 69 6e 61 72 79 20 20 00
  0x15 : more magic 1a 00
  0x17 : version (7400) e8 1c 00 00

node record:
  0x1B : end offset (far ahead) 5f 07 00 00
  0x1F : property count (0) 00 00 00 00
  0x23 : property list length (0) 00 00 00 00
  0x27 : name length (18) 12
  0x28 : name ("FBXHeaderExtension") 46 42 58 48 65 61 64 65 72 45 78 74 65 6e 73 69 6f 6e
  0 bytes of property info we'll skip
  there's space until the end of the record so children follow

  node record:
    0x3A : end offset (92) 5c 00 00 00
    0x3E : property count (1) 01 00 00 00
    0x42 : property list length (5) 05 00 00 00
    0x46 : name length (16) 10
    0x47 : name ("FBXHeaderVersion") 46 42 58 48 65 61 64 65 72 56 65 72 73 69 6f 6e
    0x57 : property data (skipping) 49 eb 03 00 00 
    ...

At this point in the stream comes the next level of child nodes. I'm supposed to derive the correct size (in this case it happens to be 0) while only having an absolute offset (92).

Ideally I would be able to specify that the end of the structure is "absolute offset 92", since that's exactly what the format has told me, and is thus the declarative way to do it, but this syntax is unavailable.

The next preference would be the ability to compute it by being told the absolute offset of the current structure.

this._io.byteOffset gives me an offset, but it's relative to the parent io object, not the root one, so it doesn't work once you're in the next level of nodes.

this._root._io.pos gives me 1887, which seems to mean that it has already read past the entire sub-stream. This is probably some kind of implementation detail, but I had expected it to be the current offset. If this variable had returned the expected value, another option might have been to repeat-until: _root._io.pos == endOffset.