kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.03k stars 197 forks source link

How to determine all fields to descend into? #531

Open lwerdna opened 5 years ago

lwerdna commented 5 years ago

I'm experimenting with a visualizer using parsers compiled to python but I don't understand how to reliably and generically determine which fields were added during the parse (and which fields my visualizer should look into).

Let me use elf.ksy as an example. Starting at the top level object myElf (object of type Elf), I can use keys from the myElf._debug dictionary or the myElf.SEQ_FIELDS list to know to look into 'magic', 'abi', 'endian', and so on.

The problem is that sometimes critical fields are not named in the ._debug or .SEQ_FIELDS attributes. For example myElf.header._debug does not have keys 'program_headers' and 'section_headers' but myElf.program_headers and myElf.section_headers exist and are important to be seen in the viewer.

The web IDE presents these fields, so where am I going wrong?

KOLANICH commented 5 years ago

program_headers and section_headers are instances, not fields. It may be a bug in the compiler itself: we definitely want them be supported in _debug.

@lwerdna you can try isinstance to the relevant types, as I do in kaitai_struct_fs.

@GreyCat, should this bug-tracker be closed and all the issues moved to the unified bug-tracker?

lwerdna commented 5 years ago

@KOLANICH Thanks for your reply! I'm trying your advice now, combining the keys from ._debug, attribute names from .SEQ_FIELDS, and attribute names from vars() or dir() filtered through isinstance(curObj, KaitaiStruct). Also lists are special cased, if they contain elements that are instances of KaitaiStruct.

GreyCat commented 5 years ago

@lwerdna It's basically as KOLANICH mentioned, program_headers and section_headers are instances, thus they don't have any particular order to follow and thus they won't appear in SEQ_FIELDS. You can still get them using regular reflection mechanisms of your language (i.e. dir() in Python).

_debug should still contain all relevant info on the key of _m_section_headers, as per this compilation result:

        @property
        def section_headers(self):
            if hasattr(self, '_m_section_headers'):
                return self._m_section_headers if hasattr(self, '_m_section_headers') else None

            _pos = self._io.pos()
            self._io.seek(self.section_header_offset)
            self._debug['_m_section_headers']['start'] = self._io.pos()
            if self._is_le:
                self._raw__m_section_headers = [None] * (self.qty_section_header)
                self._m_section_headers = [None] * (self.qty_section_header)
                for i in range(self.qty_section_header):
                    if not 'arr' in self._debug['_m_section_headers']:
                        self._debug['_m_section_headers']['arr'] = []
                    self._debug['_m_section_headers']['arr'].append({'start': self._io.pos()})
                    self._raw__m_section_headers[i] = self._io.read_bytes(self.section_header_entry_size)
                    io = KaitaiStream(BytesIO(self._raw__m_section_headers[i]))
                    _t__m_section_headers = self._root.EndianElf.SectionHeader(io, self, self._root, self._is_le)
                    _t__m_section_headers._read()
                    self._m_section_headers[i] = _t__m_section_headers
                    self._debug['_m_section_headers']['arr'][i]['end'] = self._io.pos()

            else:
                self._raw__m_section_headers = [None] * (self.qty_section_header)
                self._m_section_headers = [None] * (self.qty_section_header)
                for i in range(self.qty_section_header):
                    if not 'arr' in self._debug['_m_section_headers']:
                        self._debug['_m_section_headers']['arr'] = []
                    self._debug['_m_section_headers']['arr'].append({'start': self._io.pos()})
                    self._raw__m_section_headers[i] = self._io.read_bytes(self.section_header_entry_size)
                    io = KaitaiStream(BytesIO(self._raw__m_section_headers[i]))
                    _t__m_section_headers = self._root.EndianElf.SectionHeader(io, self, self._root, self._is_le)
                    _t__m_section_headers._read()
                    self._m_section_headers[i] = _t__m_section_headers
                    self._debug['_m_section_headers']['arr'][i]['end'] = self._io.pos()

            self._debug['_m_section_headers']['end'] = self._io.pos()
            self._io.seek(_pos)
            return self._m_section_headers if hasattr(self, '_m_section_headers') else None

If that doesn't work, please take a look at the generated code and tell me how to fix that — I can fix ksc to generate better _debug information.

@KOLANICH I'm not sure about closing per-runtime trackers, but may be it is the way to go. Still, transferring the issues at GH become so much easier, so it's not that hard to move it anyway.

lwerdna commented 5 years ago

@GreyCat The problem is that 'section_headers' doesn't appear in .SEQ_FIELDS or ._debug, and neither does '_m_section_headers' until AFTER I first access it (via something like ElfObj.header.section_headers), triggering the @property decorated method.

Maybe a .INSTANCE_FIELDS member for instances (like .SEQ_FIELDS for sequence fields) would be convenient.

GreyCat commented 5 years ago

This all seems to be heavily related to a previous effort to add something like __repr__ to KS Python runtime: https://github.com/kaitai-io/kaitai_struct_python_runtime/pull/10

I can add generation of INSTANCE_FIELDS or something similar, but then again, I'm not sure it's adding anything to the pot, as you already can get that by asking for dir(obj) and filtering it afterwards.