joxeankoret / diaphora

Diaphora, the most advanced Free and Open Source program diffing tool.
http://diaphora.re
GNU Affero General Public License v3.0
3.51k stars 370 forks source link

Diaphora 3.2.1 is much slower than diaphora 2.1.0 when exporting large binaries #305

Open ddf8196 opened 1 month ago

ddf8196 commented 1 month ago

I'm trying to export a large binary with about 180,000 functions using the latest diaphora, and I've noticed that the export time has become very long compared to diaphora 2.1.0. And I noticed that the export speed seems to slow down as more functions are exported, so I did the following test:

  1. Set DIAPHORA_PROFILE=1 to enable profiling.

  2. Set the export range to 0x140001000-0x140800000 to export only the first 30,000 functions. This took 32 minutes.
    Log: 2-0x140001000-0x140800000.log

  3. set the export range to 0x140800000-0x141000000 to export about 25000 functions afterwards. This took 39 minutes.
    Log: 3-0x140800000-0x141000000.log

  4. Set the export range to 0x140001000-0x141000000 to export the first 55,000 functions. This took over two hours, almost twice as slow as exporting the two parts separately.
    Log: 4-0x140001000-0x141000000.log

  5. For comparison, using diaphora 2.1.0 to export the first 55000 functions, this took only 15 minutes, faster than either of the above.
    Log: 5-0x140001000-0x141000000-diaphora2.1.0.log

The above tests basically confirms that diaphora 3.2.1 export speed slows down as the number of exported functions increases, and is significantly slower than diaphora 2.1.0.

ddf8196 commented 1 month ago

By looking at the profiler logs, I found that the main change in diaphora 3.2.1 compared to diaphora 2.1.0 is that sqlite3.Cursor.execute is taking a lot of time, which may be the cause of this performance issue. By looking at the cumtime column in the log, I found that it seems to be the cur.execute in the get_bb_id function that is causing the problem, this function is taking a high amount of time compared to 2.1.0, and with the increase in the number of exported functions, the time per call to this function is getting longer. diaphora 3.2.1 (first 30000 functions): image diaphora 3.2.1 (25000 functions): image diaphora 3.2.1 (first 55000 functions): image diaphora 2.1.0 (first 55000 functions): image

Looking into this function, there is only one sql querying data from basic_blocks table. image

Comparing the basic_blocks table in diaphora 3.2.1 with the one in diaphora 2.1.0, there is a new asm_type column and the address column no longer has the unique constraint. Could this be the cause of the problem? If so, is there a way to optimize it? Such a long export time ( more than 24 hours) almost makes the latest diaphora no longer usable for me.

joxeankoret commented 1 month ago

Uhm... it sounds weird. Let me take a look to it. If you can share your binaries it would be cool, but I guess I can work on this issue without them anyway. Thanks for letting me know!

ddf8196 commented 1 month ago

The binary file is bedrock dedicated server for windows and can be downloaded from this link: https://minecraft.azureedge.net/bin-win/bedrock-server-1.20.81.01.zip Btw, I'm using IDA 8.3 with Python 3.11.6 on windows.