pfalcon / ScratchABit

Easily retargetable and hackable interactive disassembler with IDAPython-compatible plugin API
GNU General Public License v3.0
393 stars 47 forks source link

More functions to define functions, labels and thumb code #32

Closed thesourcerer8 closed 6 years ago

thesourcerer8 commented 6 years ago

Yes, I was surprised how easy it was to get it running. I think your base architecture for ScratchABit is extremly promising!

I don't mind at all. I am not very experienced in Python yet, so I am happy to get my code improved, to learn some Python that way.

Please rewrite my submissions and improve them any way you like. It's hard to learn all the different programming styles of lots of opensource projects and write code in an optimal way for all of those projects.

pfalcon commented 6 years ago

@thesourcerer8: Ok, debugging thru this. My aim is get savable and disasm-writable result after running your init.py from http://www2.futureware.at/~philipp/ssd/disasm.html . The one I have is:

$ md5sum init.py 
46fef7e7d46d8ce3c2f295611f3a6033  init.py

Unfortunately, it seems to contain errors which affect disassembly. For example:

SetRegEx(0x0000B2A6,"T",0,2)
SetRegEx(0x0000ED96,"T",0,2)

These statements try to mark those address as containing ARM mode instruction. That doesn't make sense, as ARM instruction should start at the address divisable by 4. You can also see that addresses +2 are marked as Thumb:

SetRegEx(0x0000ed98,"T",1,2)
SetRegEx(0x0000b2a8,"T",1,2)

Can you please look into figuring out these issues? I'm so far commenting them out.

pfalcon commented 6 years ago

More offending lines:

SetRegEx(0x0000BD44,"T",0,2)
SetRegEx(0x0000C5F2,"T",0,2)

Some lines are repeated more than once. Generally, there're bunch of duplicate lines.

pfalcon commented 6 years ago

Well, there're gazillion of these. I went for manual search-and-comment sweep, now at line 1676 of 9415, and already got 10K diff. I guess, you'll need to look at this, patching on my side doesn't make sense.

pfalcon commented 6 years ago

I don't know also how old the comments in that file are and if you figured that afterwards, but:

+0000007c     dsb  ; Call Hypervisor Service if status-MI (Minus or negative result)SVCMI 0xf07ff5
+00000084     isb  ; SVCVS 0xf07ff5 (if overflow)

As can be seem those instructions are just data/instruction barriers, not sure if you originally used disassembler which doesn't understand armv7 ISA.

pfalcon commented 6 years ago
+000001e0     msr      cpsr_c, r0  ; manually Switching from Supervisor Mode to IRQ mode
+000001e4     ldr      sp, [pc, #0x328]  ; This seems a bit like testing the functionality of the MPU?

Well, why, this initializes SP of IRQ mode. Anyway, this is wrong place for these, I actually just verifying that MakeComm() worked as it should ;-).

pfalcon commented 6 years ago

Ok, I pushed cleaned up version of this patch as https://github.com/pfalcon/ScratchABit/commit/d7373d75809f289854be5ea25cf9214202506d8b . At least nothing there should be incorrect. Maybe, more operations would be needed, for complete testing would need to resolve Thumb-as-ARM mismarking described above.

Thanks for the patch!

thesourcerer8 commented 6 years ago

Thanks for integrating it!

thesourcerer8 commented 6 years ago

Thanks, that was a bug in my generator, I assumed that BL jump to 0x....6 would also get the LSB set, but it turned out that jumping to 0x....6 is sufficient to switch to Thumb mode. Yes, I was OpenOCD's disassembler didn't knew about several instructions. Yes, I later learned that this initializes the SP of IRQ mode, but I hadn't updated the comments. Thanks, I improved the comment now.

pfalcon commented 6 years ago

Thanks, that was a bug in my generator, I assumed that BL jump to 0x....6 would also get the LSB set, but it turned out that jumping to 0x....6 is sufficient to switch to Thumb mode.

Yeah, B and BL instructions (which are always direct-address) can't switch instruction mode. If they call from ARM mode, they jump to ARM code, and if call from Thumb, they jump to Thumb. They don't even encode LSB in their bits, its superfluous.

Instruction mode can be switched (or not) with BX, which exist only as register-address (and uses register's LSB for switching/not switching).

Finally, there's also BLX instruction, which exists in 2 variants: direct-addressing and register-addressing. Direct version unconditionally switches instruction mode to the opposite - if called from ARM, goes into Thumb, and vice-versa. Likewise, it simply doesn't store LSB in the instruction. Register addressing version uses LSB of a register, and may either switch mode or not.

pfalcon commented 6 years ago

Thanks, that was a bug in my generator

I downloaded the new version (size 1244310), but

SetRegEx(0x0000BD44,"T",0,2)

which was reported above is still there. Fixing it, there's another, etc.

pfalcon commented 6 years ago

So, what I'm doing about this is filtering out all SetRegEx(*,"T",0,2) calls from your file - the ARM mode is the default, so where needed, setting it is optional, and "false positives" are filtered out:

grep -v ',"T",0,2)' EXT0CB6Q_dec_P21_frmw_init.py >EXT0CB6Q_dec_P21_frmw_init.py.1

With that, write disassembly doesn't crash. There's still an issue that engine.analyze() isn't called to process pending data which the script queued, I'm working on the best way to resolve that.

thesourcerer8 commented 6 years ago

Thanks for the explanation. I am not sure, whether I fully understood it yet and implemented it correctly now. Please try again and let me know.

thesourcerer8 commented 6 years ago

Can you provide a link to the documentation that states that it switches instruction mode to the opposite? I tried to find that in the ARM documentation, but I can't.

pfalcon commented 6 years ago

Yeah, it's confusing, I had to dig it when implemented combined ARM/Thumb plugin, forgot by now, and had to refresh when writing a reply to you.

You'd be looking for ARM doc "ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition". ARM DDI 0406C.c (ID051414) This one may be behind registration wall. Any older doc would work too.

p.348

A8.8.25 BL, BLX (immediate) Branch with Link calls a subroutine at a PC-relative address. Branch with Link and Exchange Instruction Sets (immediate) calls a subroutine at a PC-relative address, and changes instruction set from ARM to Thumb, or from Thumb to ARM.

And formal semantics below there.