samunders-core / le_disasm

libopcodes-based (AT&T syntax) linear executable (MZ/LE/LX DOS EXEs) disassembler modified from http://swars.vexillium.org/files/swdisasm-1.0.tar.bz2
GNU General Public License v3.0
19 stars 5 forks source link

Klei1984/add support for 16bit le segments #11

Closed klei1984 closed 5 years ago

samunders-core commented 5 years ago

Given the amount of ternary operators w.r.t bitness I'd prefer having Type::CODE changed to Type::CODE32 and added Type::CODE16. WDYT?

klei1984 commented 5 years ago

Its a good idea, I will rework the change set and will send a new PR.

klei1984 commented 5 years ago

I tried to give some thought to this. The class hierarchy looks something like this: analyzer > image (sections/object list) > regions (list) > region > disasm > insn print > lx, image, analyzer > all the same again in a different way

1) Image segments define the default bitness to be 16 or 32 bits. 2) Regions is just a list so as long as segment boundaries are not allowed to be merged by the class, bitness information is redundant; but this is not the case. Regions allow crossing image segment boundaries, thus bitness is useful to avoid incorrect merging of a 16 bit region with a 32 bit one. 3) Region instances are all UNKNOWN type in the beginning so type CODE16 or CODE32 cannot be applied at that time as we do not known whether region is DATA or CODE??. On the other hand bitness information is crucial for the Disasm class to be available. 4) Disasm needs to know the machine type which is derived from bitness information which is inherited from the Region or maybe from the ImageObject (segment) I do not remember. In this sense Disasm does not need to know the bitness information, but it must know the machine type which needs to be configured via API layer of Disasm class. 5) Insn presents crucial data for the Analyzer and Printer classes. As there is no back reference from Insn object to its parent Region, the Insn objects need to be self-contained thus require bitness information or as an alternative machine type inherited from Disasm class. With Insn data analysis the Region type could be set to CODE16 or CODE32 finally but this is too late, bitness info cannot be avoided. 6) Printer functions receive the analyzer and image objects with the regions and label types. But at the end Printer again uses Disasm and Insn objects to perform further analysis and post processing so the entire chain of data hierarchy needs to be made visible for the Printer again.

The tenary operators can be get rid of in many cases by introducing new getters or setters. But in general CODE16 and CODE32 will not make bitness information disappear. In the Printer functions it would decrease the amount of bitness checks though for sure.

The analyzer's and printer's main data sources are the Insn objects. So if we could make Insn objects aware of their parent region or image object (segment) then we do not need to query the bitness information from outside via APIs as this would be part of the Insn object - a single source of self contained information block. Regions are spit and merged all the time and I am not so sure that Insn objects' life cycle would not overlap with a Region split or merge action, but I could analyze this and if regions are read-only while Insn objects live, we can set the Insn object's parent to be its containing region and most of the external checks the analyzer and the printer performs now could be moved into Insn and expose getter/setter APIs that hide the related complexity from the upper layer classes. We could also create a child parent relationship between ImageObjects and the Regions derived from them. This way the complexity to look up code in Image based on Region could be moved into the Region objects themselves.

Of course all these Parent <> Child relations would basically break up the flat class layer structure as Insn would need to include the Class declaration of either Region or Image. Hopefully this is acceptable.

I will try to move in the above described direction with the code refactoring to reduce bitness clutter.