Open GreyCat opened 6 years ago
Bytes and FixedSized(GreedyBytes) are not interchangeble, they have different build semantics. Bytes takes data of exact length, the other takes whataever is shorter and adds padding. I suggest we make FixedSized default and use -construct-render: Bytes as option.
Then it should be exactly Bytes
, as size: X
does not do any padding and is expected to fail when reading past end of stream.
FixedSized(10) would fail if given a 5 byte stream. The difference is in how it builds (which Kaitai doesnt do, right?). If size: N, type: struct1
means a substream passed to Struct then size: N
should mean a substream passed to GreedyBytes. You are the guy who designed it so I wont pretend to know better than you what those tags are supposed to mean, but my impression/understanding is, size is supposed to mean (1) constrain parsing to a chunk (2) consume entire chunk even if type needs less bytes. This would at least imply that building would also have (2) semantics, build that many bytes even if type produced less. Meaning, add padding.
When building:
size: 5
the byte array or string is expected to be exactly 5 bytes long. Everything else would result in a pre-build validation error. A slightly more complicated case:
- id: len
type: u4
- id: buf
type: str
size: len
Here, buf
is expected to match len
exactly. If it doesn't, it's a validation error. User either need to fix len
or fix buf
before building. If user's just submitting parse result into building, they're guaranteed to match.
For a user type invocation with a substream:
size: 5
type: foo
that foo
type is expected to be 5 bytes or less. If it is more than that, it's validation error. If it's less than that, then padding would be applied to ensure that it's exactly 5 bytes in the output.
The point here is read-write-read consistency: one is expected to build cleanly whatever's was just parsed from a valid input binary stream, and, vice versa, pase exactly the same stuff that was built.
OTOH, you're absolutely right, these build semantics are something that could be implementation-dependent, so there's no point in enforcing a specific behavior.
I think that KSY syntax has inherently some abiguity, it doesnt express intentions very clearly. For example size
is often used as Padding as much as it is used as Bytes. Also u1
and b1
are often used as Flag and not an integer. Which leads me to an idea, would you consider adding a u1bool
type? That of course being separate from -construct-render: Flag
option.
Bytes(n)
instead of FixedSized(n, GreedyBytes)
Actually I just commited some optimisation that makes FixedSized detect that subcon is GreedyBytes, in which case it doesnt use a substream but just returns the data bytes. Performance of these two is identical. Build semantics however are not. And there is also readability aspect.
Computed(...)
sometimes can be used without lambdaI will need to run some benchmark to confirm this, but I believe that lambda is actually faster than a this expression. The latter was concieved as a tool for convenience, something that is easier to write, but its not by any measure intended to be top performance. Therefore I would actually consider leaving current form as the default.
This issue will accumulate and track various optimizations and enhancements for Construct target (#377).
CString
where applicable instead ofNullTerminated(GreedyString)
Bytes(n)
instead ofFixedSized(n, GreedyBytes)
StringEncoded(FixedSized(GreedyBytes))
instead ofFixedSized(GreedyString)
include
,consume
,term
in NullTerminatedLazyBound(lambda: type_name)
Computed(...)
sometimes can be used without lambda-construct-render:
options for Padding Flag inlineStruct inlineEnum