kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
3.95k stars 191 forks source link

Allow `contents` key be just a one byte #665

Open Mingun opened 4 years ago

Mingun commented 4 years ago

In protocol, on which I currently working, frequently used delimiter bytes, such as FS (0x1C), GS (0x1D), and so on. Right now I'm forced to describe they as

seq:
  - contents: [0x1C]

which translates to slightly non-optimal code, which involves array creation:

// Java example
this._unnamed1 = this._io.ensureFixedContents(new byte[] { 28 });

I think, that such delimiter bytes are widely used in many protocols, so it will be great, if I could write

seq:
  - contents: 0x1C

and get

// Java example
this._io.ensureFixedContents(28);

(example also include applied fix for #664)

GreyCat commented 4 years ago
KOLANICH commented 4 years ago

frequently used delimiter bytes

contents is not for delimiters, it is for signatures. terminator and repeat-until are for delimiters. Delimiters are markers of an end of something. Signatures are both markers of that something is likely of some type and are markers of beginning of something. KS currently implements only the first role of signatures.

Mingun commented 4 years ago

terminator can be used only for strings and byte arrays, it won't help in my case, when I must parse string 11<FS>000<FS>…, which mean:

Symbol Description
1 Message Class - Unsolicited
1 Message Sub-Class - Transaction Request
0x1C (FS) Delimiter
000 LUNO
0x1C (FS) Delimiter
... Other message fields

Or I must parse structures, that starts from delimiter byte and part of message looks like: <FS><id><data><FS><id><data><FS><id><data>...

In language, there is no simpler way to provide a clear declarative description that a certain byte is expected here (for which contents is intended).

repeat-until always includes delimiter to data and, again, required, that data ends with that.

@GreyCat say, that version 0.9 should introduce new valid key, but right now it is not documented anywhere and besides, I do not see any 0.9 version in maven repository(es). It is available somewhere?

GreyCat commented 4 years ago

@Mingun "0.9" is whatever we have at the compiler repository at the moment. It's not yet released, and it's still some problems to solve until we would be releasing it, unfortunately :(

Documenting all new stuff is actually one of these problems. So far, the only documentation regarding valid we have is discussion in #435.

KOLANICH commented 4 years ago

Or I must parse structures, that starts from delimiter byte and part of message looks like

If it starts, it is not a delimiter. It is a marker or a signature, contents is fine for that.

When writing a KS spec please remember, that it is more docs than code.

terminator can be used only for strings and byte arrays

I wonder if that should be fixed, if it is the case. In fact it is possible to express the stuff with some hacks with _io, but the hacks will likely be broken when serialization come, have severe overhead and don't express the intention.

KOLANICH commented 4 years ago

@Mingun, here are the hacks I was talking about, you may find it useful for you: https://github.com/KOLANICH/kaitai_struct_formats/blob/qsp/game/qsp.ksy#L35L75

Though its memory consumption is terrible: a 10 MiB save file had made it to eat ~500 MiB