kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.02k stars 197 forks source link

Radical .ksy format changes proposal #55

Open GreyCat opened 7 years ago

GreyCat commented 7 years ago

With some time passing and some experience gained, I see that we could review some of the original design decisions for .ksy format. In this topic, I'll try to gather a pretty radical ideas (=breaking current format specs) and try to tell the rationale I see behind it.

Given that transition process might be pretty painful (if it would happen), at the very least, we should do a pack of such changes in one shot (probably while going from v0 => v1) and we should provide automated converters, "ksy level" specs and switches, backward compatibility support in the compiler, etc.

I'll post each proposal as separate comment, so it could be referenced, discussed, voted for/against, etc.

GreyCat commented 7 years ago

Get rid of meta/id

Instead of regular boilerplate:

meta:
  id: foo
seq:
  - id: one
    type: u1

... let's just start a file with a map of (type name => type spec) that goes right now inside a types tag. This would yield us:

foo:
  seq:
    - id: one
      type: u1

Pros

Cons

GreyCat commented 7 years ago

u4uint32, etc

I don't personally like the idea, but after explaining KS to a few dozen people already, I must admit that given that the majority of people come from C background, they find it much easier to understand uint8 or int32.

Pros

Cons

athre0z commented 7 years ago

Personally, the only thing about the types I found confusing was that they use byte granularity. How about keeping the short type prefixes, just changing byte to bit notation? Types like u32 and s32 should be well known to most C programmers (e.g. the Linux kernel uses them) and also are the standard types in some modern languages like Rust.

This would break compatibility though, but as the project is pretty fresh, I guess that's not too much of a big deal.

GreyCat commented 7 years ago

Types like u32 and s32 should be well known to most C programmers

That's definitely not a good idea: we'll have a bad clash with u8 meaning either 8 bits of 8 bytes (both are valid), thus wreaking total chaos instead of clean error message. If we go that way, probably U8-U16-U32-U64 is the way, or something like that.

well known to most C programmers (e.g. the Linux kernel uses them) and also are the standard types in some modern languages like Rust.

True enough, but u1-u2-u4 notation also came one not from the thin air, but has some use. For example, Java uses it for specs, Adobe uses it for Flash (although inconsistently), Open Watcom uses it, various emulators like zsnes, etc, etc.

It's hard to do solid estimates, but I might guess that u8-u16-u32 is about just as confusing as u1-u2-u4. Although probably it's a good idea to do some sort of general review of what modern programming languages use. For example:

koczkatamas commented 7 years ago

Added ksy modularization proposal for discussion here: https://github.com/kaitai-io/kaitai_struct/issues/71

Mingun commented 4 years ago

Get rid of ks-opaque-types

Instead all opaque types must be imported explicitly, for example, under key meta/external-types, or /external-types:

meta:
  external-types:
    - my_opaque_type1
    - my_opaque_type2
    - ...

Pros

Cons

Mingun commented 4 years ago

Rework repeat*, size* and terminator family keys

Proposal:

- id: field
  # shortcut:
  # repeat: <int-expression>
  repeat:
    # Only one key allowed
    count: <int-expression>
    until: <bool-expression>
    while: <bool-expression>
    to-eos: true
  # shortcut:
  # size: <int-expression>
  size:
    # Only one of two following keys allowed
    value: <int-expression>
    to-eos: true
    # shortcut:
    # terminator: <byte-value>
    terminator:
      value: <byte-value>
      consume: true
      include: false
      mandatory: true # replaces poor named `eos-error`

Pros

Cons

Mingun commented 4 years ago

Make _parent variable a tuple/array with all parents

So getting parent of parent would _parents.1 instead of _parent._parent. Regular _parent becomes _parents.0. Also if it is more preferrable, array indexing style can be choosed: _parents[0], _parents[1], etc.

Pros

Cons

KOLANICH commented 4 years ago

Instead all opaque types must be imported explicitly, for example, under key meta/external-types, or /external-types:

314