Open hello-adam opened 3 years ago
@hello-adam First of all, the reason why you don't see values of instances
or their _debug
info is simply that they haven't been parsed at all. Unlike seq
fields that all get parsed just by calling the _read
method (which is done automatically by default), one of the fundamental properties of instances
is that they are lazy. See https://doc.kaitai.io/user_guide.html#_instances_data_beyond_the_sequence:
Another very important difference between the
seq
attribute and theinstances
attribute is that instances are lazy by default. What does that mean? Unless someone would call thatbody
getter method programmatically, no actual parsing ofbody
would be done.
So if you want to read values of all of them, you need to eventually invoke them all. I guess the easiest method for Python is to use reflection on the generated parser classes to get the instance
names for each subtype so that you can access them afterwards. I suppose that this is going to be quite easy to do, just find how to use reflection in Python - probably there is some single function that gives you all property names when you call it with the struct object as an argument. You will need to read all instances recursively, though - start with the top-level object and while you iterate over the properties, check the value of each one if it isn't a nested KaitaiStruct
object (i.e. something like isinstance(struct[property], kaitaistruct.KaitaiStruct)
and if it is, you need to recurse into it and do the same for properties of this nested object.
Second, if you want to get _debug
info for instances, check the generated code about how the instance program_headers
is parsed and what _debug
info it stores - I marked the lines saving info to _debug
map with an asterisk *
(https://github.com/Mahlet-Inc/hobbits/blob/c51f39f/src/hobbits-plugins/analyzers/KaitaiStruct/ksy_py/executable/elf.py#L1572-L1610):
@property
def program_headers(self):
if hasattr(self, '_m_program_headers'):
return self._m_program_headers if hasattr(self, '_m_program_headers') else None
_pos = self._io.pos()
self._io.seek(self.program_header_offset)
* self._debug['_m_program_headers']['start'] = self._io.pos()
if self._is_le:
self._raw__m_program_headers = [None] * (self.qty_program_header)
self._m_program_headers = [None] * (self.qty_program_header)
for i in range(self.qty_program_header):
* if not 'arr' in self._debug['_m_program_headers']:
* self._debug['_m_program_headers']['arr'] = []
* self._debug['_m_program_headers']['arr'].append({'start': self._io.pos()})
self._raw__m_program_headers[i] = self._io.read_bytes(self.program_header_entry_size)
_io__raw__m_program_headers = KaitaiStream(BytesIO(self._raw__m_program_headers[i]))
_t__m_program_headers = Elf.EndianElf.ProgramHeader(_io__raw__m_program_headers, self, self._root, self._is_le)
_t__m_program_headers._read()
self._m_program_headers[i] = _t__m_program_headers
* self._debug['_m_program_headers']['arr'][i]['end'] = self._io.pos()
else:
# duplicate code from `if self._is_le` branch - I know the compiler
# could do a better job of eliminating this, but it's not anywhere
# high on our priorities I'd say, as long as the code works
* self._debug['_m_program_headers']['end'] = self._io.pos()
self._io.seek(_pos)
return self._m_program_headers if hasattr(self, '_m_program_headers') else None
The notable change from seq
fields is that the key in the _debug
map has the _m_
prefix, which is something you'll need to adapt your code to.
I suppose that this is going to be quite easy to do, just find how to use reflection in Python - probably there is some single function that gives you all property names when you call it with the struct object as an argument.
Not quite - one also have to filter out all the builtin and inherited methods. It would be nice to generate an explicit tuple of all instances and also a method invoking parsing of them all.
You will need to read all instances recursively, though - start with the top-level object and while you iterate over the properties, check the value of each one if it isn't a nested KaitaiStruct object (i.e. something like isinstance(struct[property], kaitaistruct.KaitaiStruct) and if it is, you need to recurse into it and do the same for properties of this nested object.
And it is possible to get into infinite recursion when there are 2 types having instances referring each other.
The notable change from seq fields is that the key in the _debug map has the m prefix, which is something you'll need to adapt your code to.
I guess it may be better to fix the compiler.
@KOLANICH:
And it is possible to get into infinite recursion when there are 2 types having instances referring each other.
Um, you mean like
meta:
id: test
seq:
- id: top
type: foo
types:
foo:
instances:
recursive_ref:
type: bar
bar:
instances:
recursive_ref:
type: foo
...? I'd say that this is a recursion by design, since it can be a perfectly legitimate thing to do (see the example in https://doc.kaitai.io/user_guide.html#_replacing_parent with a type node
that references itself), and the onus is on the KSY author to add some if
s to ensure that it won't run indefinitely when you try to read all instances
recursively. I don't know what you are getting at. I can't think of a special measure that would need to be done on the side of application code - for each KaitaiStruct
object, you request the property names, filter out the builtin and inherited symbols as you've correctly pointed out so that you end up only with seq
fields and instances
, and check the value of each one if it's an instance of another KaitaiStruct
object and recurse into if so. If a recursive descendant finally decides to end the chain, it will end up with if: false
on the instance that usually holds the nested struct, so that getting its value will yield None
, which is not an instance of KaitaiStruct
, so the recursion will also stop here.
The notable change from seq fields is that the key in the _debug map has the m prefix, which is something you'll need to adapt your code to.
I guess it may be better to fix the compiler.
I agree - I've noticed this for the first time and it probably isn't intentional, as I can't think of any logical reason for this. The actual _m_
-prefixed properties are internal and are not meant to be exposed (they should be private
or protected
in languages that support access modifiers), so I can't see why the _debug
key should be this internal property name. I was just describing the current behavior as I found it, because people tend to be more interested in the present than the possible future.
This will be a potential breaking change for users making use of the _debug
map, though...
wow, thanks for all of the info - I'll probably be able to get something working with this. I'll comment again if it's solved or if I get stuck.
and the onus is on the KSY author to add some ifs to ensure that it won't run indefinitely
In fact to generate the positions we need only pos
-instances, but we have to use pos
-instances as a workaround to inavailability of typed value-instances ...
So, again, for me it looks like that for proper solution
pos
and value
instances separatelyI think I got this working better. Still getting some weird errors when I access those properties that do the lazy parsing though. One example:
Parsing <elf.Elf.EndianElf object at 0x7fff6d85c070> at 'header'
Parsing <elf.Elf.EndianElf.ProgramHeader object at 0x7fff6d85c160> at 'header._m_program_headers[0]'
Failed when getting property flags_obj: Traceback (most recent call last):
File "/tmp/HobbitsPythonUffbRt/thescript.py", line 96, in parse_struct
getattr(struct, attr)
File "/tmp/hobbits-wolAVX/elf.py", line 1033, in flags_obj
self._m_flags_obj = Elf.PhdrTypeFlags((self.flags64 | self.flags32), self._io, self, self._root)
AttributeError: 'ProgramHeader' object has no attribute 'flags64'
The elf.py
there is the same one referenced previously.
I'm using the Kaitai Python runtime in https://github.com/Mahlet-Inc/hobbits to make a Kaitai runner and viewer plugin. A user pointed out that when running kaitai's
executable/elf.ksy
on something likelibc.so
, my viewer is missing theprogramHeaders
,sectionHeaders
, andstrings
parts of theheader
that show up in the kaitai web IDEMy issue is that I can't seem to find those fields anywhere in the
_debug
metadata produced by by the parser:The root parsed object:
The header object:
Am I missing something somewhere?