joxeankoret / diaphora

Diaphora, the most advanced Free and Open Source program diffing tool.
http://diaphora.re
GNU Affero General Public License v3.0
3.58k stars 371 forks source link

Diaphora doesn't port structs and struct offsets in the assembly instructions #222

Open Revester opened 3 years ago

Revester commented 3 years ago

Made a simple solution to test: Alt Text

I open it in first IDA with debugging symbols loaded: Alt Text

As you can see there's _varEC.Number defined in the highlighted instruction. And the respective HelloWorldPrinter struct is defined in the Structures window.

Now I run Diaphora with these settings: Alt Text

Now I open the same executable in the second IDA instance without debugging symbols and run Diaphora with these settings(note that I diff it against the sqlite Diaphora database of the executable with the debugging symbols loaded from previous screenshots): Alt Text

I then import all functions: Alt Text

IDA specifically asks if I want to import all functions, comments, prototypes and definitions: Alt Text

Is it definitions for structs or something else?

After I open the main function in the second version where I just imported information, the comment for assembly instruction with Number is there: "; my number comment": Alt Text

The definition _varEC.Number, however, is not there anymore, and the struct HelloWorldPrinter isn't defined in Structures window(it is defined in Structures window in the version with debugging symbols).

I use IDA 7.2 and Diaphora 1.2.4 from the linked release in the documentation for IDA 7.2. I also tried it with the last release of Diaphora and IDA 7.5, I get the exact same behavior.

Is this expected behavior? On the site it says that it can port struct definitions and I'd expect it to also port defined offsets in the assembly instructions just as it ports comments and function names, but it doesn't port structures and doesn't port offsets.

If it's not expected behavior, please help me and tell me how I should try to fix it. Something in the settings?

If it's expected behavior and Diaphora isn't supposed to port offsets inside instructions, maybe you guys know another tool which can port offsets inside instructions? It's really important to me because it's like 30% of the information of reversing for me.

joxeankoret commented 3 years ago

Hi! It is simply not implemented yet. I'm working on it and I will likely publish support for it in around 2 weeks, but I cannot promise anything. In case you want to know, the reason why it isn't implemented yet is because I try to use the decompiler for most tasks instead of the disassembler, and it turns out that porting variables and types with the current decompiler APIs is not possible. Matching stack variables when the stack didn't change is easy, as in your example, but doing so when the stack changes... causes lots of problems. Sure, I can finish implementing a simplistic version for the cases where the reserved stack size for local variables and the used stack offsets don't change easily, but it won't work in many other real world cases, in which I'm working. What I will probably do is add support for the cases in which Diaphora can be 100% sure it's the same variable and rename it and apply types (if any), and add support for other cases where heuristics can be applied in the future.

As for other tools... the only other binary diffing public tool out there that works and is maintained is BinDiff, but IIRC, they do not import local variables either.

Revester commented 3 years ago

Hi! It is simply not implemented yet. I'm working on it and I will likely publish support for it in around 2 weeks, but I cannot promise anything.

Hi, thanks, that's nice to hear.

Revester commented 2 years ago

Hi! It is simply not implemented yet. I'm working on it and I will likely publish support for it in around 2 weeks, but I cannot promise anything. In case you want to know, the reason why it isn't implemented yet is because I try to use the decompiler for most tasks instead of the disassembler, and it turns out that porting variables and types with the current decompiler APIs is not possible. Matching stack variables when the stack didn't change is easy, as in your example, but doing so when the stack changes... causes lots of problems. Sure, I can finish implementing a simplistic version for the cases where the reserved stack size for local variables and the used stack offsets don't change easily, but it won't work in many other real world cases, in which I'm working. What I will probably do is add support for the cases in which Diaphora can be 100% sure it's the same variable and rename it and apply types (if any), and add support for other cases where heuristics can be applied in the future.

Hello again, I see you haven't released the version that would port symbol information. I was wondering if there was any progress on this feature?

And if not, I could try to implement it myself. I checked out the repo and how many lines of code it has. With pygments it's 76k and without pygments it's still 17k of lines of code. Please, tell me, which Diaphora modules I should check out and understand to understand what I have to do to implement this feature: export and import all information about the symbol: manual name for a symbol(Alt+F1 shortcut), argument name, local variable name, comments, type of the variable when set, and other properties which I don't remember atm.

joxeankoret commented 11 months ago

This is going to be implemented in Diaphora 3.1 (to be released before the end of this year).