Open Mingun opened 4 years ago
Allowing size to built-in sized types also more naturally handle such situations:
meta:
id: size_problems
endian: be
encoding: utf-8
seq:
- id: field
size: 5
type:
switch-on: 2
cases:
1: u4
2: str
Depending on switch-on
result size
member is taken into account or not. I think that is not good. size
should define size of field, not the size of type.
In the example posted in your follow-up reply above, your two cases are a five-character string and a four-byte integer presumably with a padding byte: it would be best to handle them as separate, explicit "types", rather than trying to claim that the field is five-bytes thereby creating a new problem in which we try to figure out how to extract an integer from an over-sized field (i.e., how is KSC supposed to know which four of the five bytes to use for the integer (the answer might change with endianness...) and what is it supposed to do with the extra byte?).
My instinct here is that the size
attribute is implicit when applied to a field with a built-in type. I don't see any reason why one shouldn't be allowed to specify it explicitly (other than product implementation, documentation, and support costs....), but, if one does specify it, it must match, and so your example of a size: 5
field containing a u4
should result in an error.
This is just a minimal example showing the problem, assume that it will look like this:
meta:
id: size_problems
endian: be
encoding: utf-8
seq:
- id: field
size: 5
type:
switch-on: 2
cases:
1: U4
2: str
types:
U4:
seq:
- id: field
size: 4
Now it no longer raises these questions, because everything works as described in the documentation, right?
creating a new problem in which we try to figure out how to extract an integer from an over-sized field
This problem never existed, you don't complain about which bytes to use for type U4
in my last example? Why you should complain about u4
? Just because it is built-in type?
(i.e., how is KSC supposed to know which four of the five bytes to use for the integer (the answer might change with endianness...) and what is it supposed to do with the extra byte?)
Endianness do not matter here. First 4 bytes will be used for decoding type, last byte is garbage
My instinct here is that the size attribute is implicit when applied to a field with a built-in type. I don't see any reason why one shouldn't be allowed to specify it explicitly (other than product implementation, documentation, and support costs....), but, if one does specify it, it must match, and so your example of a size: 5 field containing a u4 should result in an error.
If absence of size
for naturally sized types will just take natural size of type, then switch-on
constructions with types of different sizes will not work. For example, that breaks almost any T(L)V implementation.
rather than trying to claim that the field is five-bytes
Just on the contrary, it is most obvious to say that size refers to the size of the field, and not to the size of the type. For user types, this is the case.
Summary
Trying to formalize the way how size-related properties should model a Kaitai Struct type, I found some inconsistences. In the spoilers below you can see a diagrams that shown how each part of a binary stream is used by an attribute depending on which how that attribute is defined:
`size-eos: true`
The size of an attribute is defined by a `size-eos: true` key: * ```yaml # /==\ - type # Stream: [____ ] # 0 eos # \_______available for type_____/ # \_______consumed bytes_______/ # \_____________stream size__/ size-eos: true ``` * ```yaml # /==\ - type # Stream: [____ x ] # 0 terminator eos # \_available for type_/ / # \_______consumed bytes_______/ # \_____________stream size__/ size-eos: true terminator: x ``` * ```yaml # /==\ - type /========\ - padding # Stream: [____ ~~~~~~~~~~] # 0 eos # \_available for type_/ / # \_______consumed bytes_______/ # \_____________stream size__/ size-eos: true pad-right: ~ # terminator: ~ - implicit ``` * ```yaml # /==\ - type /=======\ - padding # Stream: [____ x~~~~~~~~~] # 0 eos # \_available for type_/ / # \_______consumed bytes_______/ # \_____________stream size__/ size-eos: true terminator: x pad-right: ~ ````size`
The size of an attribute is defined by a `size` key: * ```yaml # /==\ - type # Stream: [____ ] | # 0 eos # \____available for type___/ / # \_____consumed bytes____/ / # \_____________stream size__/ size: s ``` * ```yaml # /==\ - type # Stream: [____ x ] | # 0 terminator eos # \_available for type_/ / / # \_____consumed bytes____/ / # \_____________stream size__/ size: s terminator: x ``` * ```yaml # /==\ - type /===\ - padding # Stream: [____ ~~~~~] | # 0 eos # \_available for type_/ / / # \_____consumed bytes____/ / # \_____________stream size__/ size: s pad-right: ~ # terminator: ~ - implicit ``` * ```yaml # /==\ - type /==\ - padding # Stream: [____ x~~~~] | # 0 terminator eos # \_available for type_/ / / # \_____consumed bytes____/ / # \_____________stream size__/ size: s terminator: x pad-right: ~ ```No `size` or `size-eos: true`
The size of an attribute is defined implicitly by absence of any size-related keys: * ```yaml # /==\ - type # Stream: [____ ] | # 0 terminator eos # \_available for type_/ / # \__consumed bytes__/ / # \_____________stream size__/ terminator: x ``` * `pad-right` implicitly defines `terminator` but this is lead to an error: ```yaml # (main): /seq/0: error: 'size', 'size-eos' or 'terminator' must be specified # /==\ - type /===\ - padding # Stream: [____ ~~~~~] | # 0 eos # \_available for type_/ / / # \_____consumed bytes____/ / # \_____________stream size__/ pad-right: ~ # terminator: ~ - implicit ``` * `pad-right` is useless, but this is not forbidden: ```yaml # /==\ - type # Stream: [____ ] | # 0 terminator eos # \_available for type_/ / # \__consumed bytes__/ / # \_____________stream size__/ terminator: x pad-right: ~ ```Problem 1 --
size
andsize-eos
handled inconsistently depending on the type kindI found, that
size
andsize-eos
keys won't work with built-in types with known size (u1
, etc.):At the same time you can specify
terminator
key, but it seams it has no meaning for that case:Inconsistency 2 --
pad-right
can not be used aloneWhen
pad-right: x
used together with thesize
orsize-eos: true
, it is implicitly defines aterminator: x
if the value of the terminator is not defined explicitly or implicitly bystrz
.Also, it is possible to define only a
terminator
withoutsize
orsize-eos: true
:At the same time defining only the
pad-right
is an error:Solution
I think, it will more consistent, if size-related keys will be independent from types. Each kaitai stream must have 4 properties:
pos
keysAt start root stream have:
Parsing will run as follows (
current_offset
-- global offset in parsed data):start_offset = current_offset
end_offset = parent_stream.end_offset
local_current_offset = parent_stream.local_current_offset
consumed_size = undefined
size
specified,consumed_size = <size>, end_offset = start_offset + <size>, local_current_offset = 0
size-eos: true
specified,local_current_offset = 0
terminator
specified, find firstterminator
in sub-stream, and limit stream availible sizelocal_current_offset = 0
eos-error: true
raise error, stopconsumed_size
is not defined,consumed_size = end_offset - start_offset
end_offset = <position of terminator>
include: true
specified,end_offset = end_offset + 1
consumed_size
is not definedconsumed_size = <position of terminator> - start_offset
consume: true
specified,consumed_size = consumed_size + 1
pad-right
specified, find firstpad-right
in sub-stream, and limit stream availible sizeend_offset = <position of start padding>
[start_offset, end_offset)
slice as source of datastr
,strz
andbytes
) use entire buffer as resultconsumed_size
is not defined, setconsumed_size = <readed size>
current_offset = start_offset + consumed_size
That algorithm also fixes #783