Open deefha opened 6 years ago
Well, first of all, I don't think that you to use _index
and parametric types here at all. This would probably do:
seq:
- id: header
type: t_header
- id: count
type: u4
- id: fonts
type: font_offset
repeat: expr
repeat-expr: 63
types:
font_offset:
seq:
- id: offset
type: u4
instances:
body:
pos: offset
type: font_body
if: offset != 0
font_body:
seq:
# real parsing of a font body happens here
Anyway, what you're talking about here is somewhat important stuff. There is actually a problem, but probably not the one you've had in mind.
if
+ repeat
working together — that's really how it is by design, i.e. if
is supposed to be outside of the loop, calculated only once. This is important, because:
if
made it clear that whole attribute might exist or not exist. If we'll treat every individual array element as "nullable", then it would be pure nightmare for some languages (like C++/STL, which lack nullable primitive types like integers).if
as per-element if, then we have no other way to do a per-whole-attribute check.Given all that stuff, it's unlikely that we'll change if
+ repeat
behavior.
pos
+ repeat
working together is slightly different, but still some of the same principles apply. Right now, pos
is done before the loop, and it's somewhat hard to change. If we'll start doing it on every iteration, then traditional stuff like that would be ruined:instances:
pos: 0x1000
type: foo
repeat: expr
repeat-expr: 0x10
Right now it reads 0x10 items of foo
, starting at offset 0x1000, but if we'll make pos:
work inside a loop, then we'll be getting 0x10 items of same data, every item starting at offset 0x1000. I understand that it's tempting to use pos: something[_index]
, but it would be pretty awkward to make it incompatible with previous pos
function and/or to introduce some sort of autodetection on where to put that position setting — inside or outside of the loop.
To summarize that, _index
is still somewhat experimental feature. As mentioned in #147, it doesn't do lots of checks it should do. Namely, we should add checks, as outlined in this comment. I'm not 100% sure about pos
+ _index
, but I'm currently inclined to ban it too, and recommend to use it via extra type layer.
OK, thank you for the explanation, finally I understand.
I have no problem with proposed solution (i.e. refactoring my struct), it was actually the very first version of struct I created. For some reasons, I wanted to reflect logical organization of file (header, FAT, data) in generated source, to be able to access these parts independently. But it is not absolutely necessary, so I can use simpler (and as a matter of fact cleaner and much more elegant) approach.
Let's say I just wanted to test the new "_index" feature, which per se I consider really useful :-)
Well, as I said, just for the sake of experimenting, you can use _index
like with an intermediate type, albeit it's kinda awkward:
seq:
- id: header
type: t_header
- id: count
type: u4
- id: offsets
type: u4
repeat: expr
repeat-expr: 63
- id: fonts
type: font_offset(_index)
repeat: expr
repeat-expr: 63
types:
font_offset:
params:
- id: idx
type: u4
instances:
body:
pos: _parent.offsets[idx]
type: font_body
if: _parent.offsets[idx] != 0
font_body:
seq:
# real parsing of a font body happens here
I see, using "_index" as type parameter - that's smart! Thank you again :-) It solves almost everything for me.
I have binary file with following structure:
Any FAT offset can contains zero. Such offset is not valid and have no corresponding content in data section, thus reading data section can be skipped for that offset. I hope I explained it correctly, although it is not so important again.
So I have following Kaitai Struct (excerpt):
As you can see, in "fonts" instance I'm using simple repetition and absolute positioning by previously gathered FAT offsets. There is also "if" condition for skipping zero offsets. I'm using parametric type "t_font", because why not (my ksc version is 0.8-SNAPSHOT).
The problem is in generated source. For "fonts" instance mentioned above I get this Python code (excerpt):
This is obviously incorrect and causes an error (use of variable "i" before exists). Correct code should be slightly rearranged (condition inside repetition):
Same situation occurs e.g. in generated PHP source (excerpt), so I think this problem is global indeed:
I know that my examples are not general, but I cannot describe problem better. I just expected different generated source code than I got. I'd like to help with further exploration.