pfalcon / ScratchABit

Easily retargetable and hackable interactive disassembler with IDAPython-compatible plugin API
GNU General Public License v3.0
393 stars 47 forks source link

Handling invalid (cross-)references #20

Closed neuschaefer closed 7 years ago

neuschaefer commented 7 years ago

Hi, I'm writing a CPU plugin which calls ua_add_dref(0, xxx, dr_O) based on a value that's loaded into a register. Sometimes these values don't point into a valid address range, leading to a an error when I try to save the project:

2017-05-26 21:45:56,666 Exception processing user command
Traceback (most recent call last):
  File "/.../ScratchABit.py", line 123, in handle_input
    return super().handle_input(key)
  File "/.../ScratchABit/picotui/basewidget.py", line 69, in handle_input
    res = self.handle_key(inp)
  File "/.../ScratchABit/picotui/editor.py", line 208, in handle_key
    return self.handle_edit_key(key)
  File "/.../ScratchABit.py", line 543, in handle_edit_key
    saveload.save_state(project_dir)
  File "/.../ScratchABit/scratchabit/saveload.py", line 32, in save_state
    engine.ADDRESS_SPACE.save_addr_props(project_dir + "/project.aprops")
  File "/.../ScratchABit/scratchabit/engine.py", line 680, in save_addr_props
    fl = self.get_flags(addr)
  File "/.../ScratchABit/scratchabit/engine.py", line 247, in get_flags
    raise InvalidAddrException(addr)
scratchabit.engine.InvalidAddrException: (57259, '0xdfab')

Should I fix this in my plugin or should SAB handle this case and avoid adding the reference? What does IDA do?

[ An alternative or complement to my approach would be to create a data item with the loaded value, and let the engine add the reference automatically. AFAICS this feateure hasn't been implemented in SAB so far, though. ]

pfalcon commented 7 years ago

There's not enough context, but I'd say logging a warning to the log and skipping adding an xref is definitely better than throwing exception on save.

[ An alternative or complement to my approach would be to create a data item with the loaded value, and let the engine add the reference automatically. AFAICS this feateure hasn't been implemented in SAB so far, though. ]

Again, not sure I understand the suggestion exactly. For "offset" type xref, data size is not known, so you can only heuristicalize something, and it's a question whether it will be helpful. SAB already creates a one-byte area to host a label if the label doesn't belong to any other area. Beauty of the result vary, see e.g. beginning of https://github.com/pfalcon/xtensa-subjects/blob/master/2.0.0-p20160809/out.lst .

While the crashing issue definitely needs to be fixed, a practically useful solution is to define any useful memory areas in your .def file. See e.g. https://github.com/pfalcon/xtensa-subjects/blob/master/2.0.0-p20160809/esp8266-sdk-2.0.0-p20160809.def#L6

neuschaefer commented 7 years ago

To clarify, I have ARM code like this:

08001000    ldr     a0, [pc, #8]
...
08001010    dd      0x08001234

I now do the following:

08001000    ldr     r0, [pc, #8] ; =0x08001234

What I was asking about is adding an offset reference from the instruction at 0x08001000 to whatever may be at 0x08001234¹. This could help the user find his way back from 0x08001234 to 0x08001000. In this (made up) example, It works, because 0x08001234 is probably in the same segment, but if the code would load some other constant, which is not in valid memory, I would create an invalid reference.


¹) Ideally I would replace the whole memory reference [pc, #8] with =0x08001234 and let the user press o as necessary, but I'm wrapping an external disassembler library with this plugin, so it's more work than printing the operand string unmodified, and I haven't yet figured out how operands work in idapython.

pfalcon commented 7 years ago

What I was asking about is adding an offset reference from the instruction at 0x08001000 to whatever may be at 0x08001234¹

You can't do that, as you (a CPU plugin writer) can't know whether 0x08001234 is a literal number or an address. That's one of the basic problems of program analysis: telling numbers from addresses, and the problem is undecidable. So, don't try to decide on it, you will fail, and will fail in users' face. Let them (users) handle that.

Ideally I would replace the whole memory reference [pc, #8] with =0x08001234

Xtensa CPU plugin https://github.com/pfalcon/ida-xtensa2 does that and more, I suggest studying it as a reference. (As a disclaimer, I have no idea whether it still works with IDA, actually, likely no.)

neuschaefer commented 7 years ago

Ok, thanks for your advice.

pfalcon commented 7 years ago

@neuschaefer : FYI, I made few commits, 0547a07e053e87ffb304ccbe37860b887c6fd838 and 235a26403dfeb87fcb6ec30f3c4f9936fecfb3ed. These were made to account for jumps (not cross-references) to "unknown" addresses. Unlike immediate data which can be either an address or numeric value, jump address is, well, an address.

But funny thing is that while such an issue (jump to address outside of defined address space) may happen, I never faced it so far. And those patches are actually applied to mask out issues with a CPU plugin, https://github.com/pfalcon/ScratchABit/issues/23. I.e. the plugin incorrectly decodes instructions, produces bogus jump addresses, then ScratchABit crashes on them.

With "fail fast" approach (https://en.wikipedia.org/wiki/Fail-fast), crashing on wrong stuff is actually good thing - this way, one can't miss a fact that something is wrong, and is motivated to investigate and fix it. I applied fixes above only because of the idea that jump addresses are addresses, because you raised this issue, and to let people try arm-thumb plugin without crashing.

Regarding xrefs, the previous idea still holds: unless a plugin is 100% sure that some value is address, it should not add xref for it.