kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.02k stars 197 forks source link

`process` doesn't work on strings or with repetition #706

Open Hedzin opened 4 years ago

Hedzin commented 4 years ago

Not sure if it is a bug or feature, but looks like process may be used only in top level "seq" block. details below:

I have to use process: ror(1) to decode old game .pak file. in this case all file is encoded except last 4 bytes my initial ksy was looked like:

meta:
  id: autothief_pak
  file-extension: pak
  application: CarJacker game
  endian: le
instances:
  toc_count:
    pos: _io.size - 4
    type: s4
  toc:
    pos: _io.size - 4 + toc_count * 144
    type: toc_record
    repeat: expr
    repeat-expr: -toc_count
types:
  toc_record:
    seq:
      - id: name
        type: strz
        encoding: UTF-8
        size: 128
      - id: ofs_body
        type: u4
      - id: len_body
        type: u4
      - id: unk1
        type: u4
      - id: unk2
        type: u4
    instances:
      file_content:
        pos: ofs_body
        size: len_body

and i tried to insert process: ror(1) in next places, but all attempts were unsuccessful: 1) inside top level instanses: image

file was accepted by ksv but i got error while opened second level object image

2) inside types image

file was rejected by ksv during processing image

So after I checked all examples in format folder of kaitai-struct-compiler, i found that in all examples the process is used in high level seq only. So i rewrited my ksy into next form:

meta:
  id: autothief_pak
  file-extension: pak
  application: CarJacker game
  endian: le
seq:
  - id: body
    size: _io.size-4
    process: ror(1)
    type: pak_body
  - id: toc_count
    type: s4
types:
  pak_body:
    instances:
      toc:
        pos: _io.size + _root.toc_count * 144
        type: toc_record
        repeat: expr
        repeat-expr: -_root.toc_count
  toc_record:
    seq:
      - id: name
        type: strz
        encoding: UTF-8
        size: 128
      - id: ofs_body
        type: u4
      - id: len_body
        type: u4
      - id: unk1
        type: u4
      - id: unk2
        type: u4
    instances:
      file_content:
        pos: ofs_body
        size: len_body

And it works as expected but ksy file itself has become much less simple and beautiful as I had to add unnessesary seq objects on top level to apply processing.

So the question is: Was i miss somethig or it's really impossible to add processing into the blocks i mentioned above?

generalmimon commented 4 years ago

There are two independent bugs with process in the compiler that you're encountering.

  1. using process with repeat doesn't work in any language - fortunately, today I've finished the fix and I'll push it ASAP
  2. using process is not implemented on string type at all. I don't think it's intentional, it's just another bug.

But it's easy to work around both issues, you can also wrap it into a new user type.

I'm quite sure that it doesn't matter where you use the process, it should work on any level.

webbnh commented 4 years ago

I'm quite sure that it doesn't matter where you use the process, it should work on any level.

In our code, our top-level definition references a user-type which references another user-type which uses a switch-on to reference several user-types, one of which uses process. (And, it's actually somewhat more complicated than that, as each of the first two references are made with if: false conditionals to allow them to be fetched iteratively while allowing KSC to generate code with the proper type hierarchy and parent relationships.)

KOLANICH commented 4 years ago

I propose to rename the issue.