joxeankoret / diaphora

Diaphora, the most advanced Free and Open Source program diffing tool.
http://diaphora.re
GNU Affero General Public License v3.0
3.51k stars 370 forks source link

Slow export on fairly large targets #270

Closed turbocool3r closed 3 months ago

turbocool3r commented 9 months ago

I'm facing large export times when diffing fairly large targets like JavaScriptCore and WebKit. The first half of the functions are exported in some reasonable time, but last functions take multiple hours.

This is something I see for about 8 hours since I got up, the only difference is 16 seconds left instead of 17. I've run profiler from Xcode on IDA and it seems most of the time is taken by PyObject_Str and some of its callees.

Screenshot 2023-09-27 at 17 46 44

It seems like almost no processing time is spent in IDA itself.

turbocool3r commented 9 months ago

I'm using IDA v8.3.230608 on macOS 13.5.1 (22G90) if that helps. Python version should be 3.10.4.

joxeankoret commented 9 months ago

There is little I can do here as IDA Python APIs cannot be called from multiple threads at once (thus, I'm forced to do the export sequentially) and, for example, the decompiler takes a huge amount of time with big databases. If you have the decompiler enabled, I suggest to disable it for huge targets and reading this entry in the wiki:

https://github.com/joxeankoret/diaphora/wiki/Diaphora-takes-too-long-exporting%21

Often, I have worked around such problems by creating a Diaphora Python script that filters out the rows to be exported based on some specific criteria. You have an example such Diaphora Python script here:

https://github.com/joxeankoret/diaphora/blob/master/hooks/hooks_example1.py

turbocool3r commented 9 months ago

Thanks, I appreciate the help. However while IDAPython isn’t fast I think this looks like a performance issue. The first few thousand functions are exported quite fast and later processing the same amount of functions becomes gradually slower. With IDAPython being slow I would expect the processing time for some amount of functions to be linear. Now the last ~20 functions take more than 24 hours to export. May there be places in the code where all the results are accumulated?

joxeankoret commented 9 months ago

Uhm... that sounds bad. Let me debug this because I suspect there is some bug in the export process.

So, thanks for reporting! Hopefully I might be able to have a fix for this week.

turbocool3r commented 9 months ago

Thanks for the quick response!

Lmk if I can assist somehow or provide the exact binaries I was testing on.

joxeankoret commented 9 months ago

I have the suspicion of what might be happening here but it will take quite some time for me to double check from my side (I have to run all the slow test cases, something that is going to take at least a day). Could you please try with the following patch on your side?

diff --git a/diaphora_ida.py b/diaphora_ida.py
index f7229e4..ae41012 100644
--- a/diaphora_ida.py
+++ b/diaphora_ida.py
@@ -1045,6 +1045,12 @@ class CIDABinDiff(diaphora.CBinDiff):
     self.project_script = None
     self.hooks = None

+  def clear_pseudo_fields(self):
+    self.pseudo = {}
+    self.pseudo_hash = {}
+    self.pseudo_comments = {}
+    self.microcode = {}
+
   def refresh(self):
     idaapi.request_refresh(0xFFFFFFFF)

@@ -1180,6 +1186,7 @@ class CIDABinDiff(diaphora.CBinDiff):

       self.microcode_ins_list = self.get_microcode_instructions()
       props = self.read_function(func)
+      self.clear_pseudo_fields()
       if props is False:
         continue
joxeankoret commented 9 months ago

I've committed it: https://github.com/joxeankoret/diaphora/commit/3d72b493e98db5222bb752da363b8268ea475781. With a bit of luck, it fixes your issue. If it does not, please update the issue and I will continue researching it.

turbocool3r commented 9 months ago

Sorry for the long response, I've put too much trust in GitHub notifications. Doing it now. It'll take some time, but I'm taking a flight anyway so in 3-4 hours I'll tell if it's ok.

turbocool3r commented 9 months ago

After some tests it looks like now sqlite is the main thing that slows diaphora down, but it's better. Thanks for the fix

joxeankoret commented 9 months ago

You're welcome! I will leave this issue open for now while I investigate if there is anything else I can do to make it a bit faster.

turbocool3r commented 9 months ago

That would be really appreciated. I've continued watching and about 80% of CPU time is now spent in sqlite3VdbeExec. This seems to grow with the size of the database.

joxeankoret commented 9 months ago

One question: is the disk where you are exporting a slow disk? If you have a SSD disk where you can try, could you run a quick test exporting a smaller binary to see if that's the problem in your case? Because, SQLite3 should be one of the fastest parts here, unless I'm doing something wrong and/or the disk is slow.

turbocool3r commented 9 months ago

No, I'm doing it on an SSD. The first ~5000 functions were exported quite fast, the slowdown happens after 8000-10000 functions have been added. Will test with something small now.

turbocool3r commented 9 months ago

1931 functions took about a minute

turbocool3r commented 9 months ago

The thing is still running, but now it is mostly executing something in PyObject_Str instead of sqlite3.

Screenshot 2023-10-01 at 21 17 31
joxeankoret commented 9 months ago

Could you please try running the export by setting first the following environment variable:

$ export DIAPHORA_PROFILE=1

It will launch the export process using the Python cProfile profiler and will show something like this:

         191089957 function calls (190416373 primitive calls) in 211.036 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3360   23.202    0.007   23.202    0.007 {built-in method _ida_hexrays.gen_microcode}
 28801716   18.096    0.000   31.112    0.000 diaphora.py:1131(re_sub)
     3260   13.659    0.004   14.717    0.005 {built-in method _ida_hexrays.mba_t__print}
  1027352   13.305    0.000   43.964    0.000 diaphora.py:1172(get_cmp_asm)
 28801720   13.005    0.000   13.005    0.000 {method 'sub' of 're.Pattern' objects}
    99710   10.908    0.000   10.908    0.000 {built-in method _ida_hexrays.mblock_t__print}
     1813    7.821    0.004  204.426    0.113 diaphora_ida.py:2714(read_function)
     3260    6.969    0.002   29.360    0.009 diaphora_ida.py:2220(get_microcode_bblocks)
     3360    5.835    0.002   82.621    0.025 diaphora_ida.py:2274(get_microcode)
  1744946    5.041    0.000    7.902    0.000 diaphora_ida.py:2202(get_plain_microcode_line)
     1730    3.737    0.002   40.846    0.024 diaphora_ida.py:2671(extract_microcode)
  1933696    3.378    0.000    3.378    0.000 {method 'split' of 're.Pattern' objects}
   180931    3.146    0.000   13.276    0.000 diaphora_ida.py:2485(extract_function_constants)
  2529171    3.120    0.000    3.120    0.000 {built-in method _ida_xref.xrefblk_t_swiginit}
  4001767    2.529    0.000    2.529    0.000 {built-in method _ida_lines.tag_remove}
   957021    2.259    0.000    4.593    0.000 ida_gdl.py:782(__init__)
  1871796    2.185    0.000    8.267    0.000 idautils.py:98(DataRefsFrom)
  3644984    2.139    0.000    6.454    0.000 ida_idaapi.py:312(_bounded_getitem_iterator)
   180931    2.128    0.000    2.128    0.000 {built-in method _ida_lines.generate_disasm_line}
     1630    2.122    0.001    2.122    0.001 {built-in method _ida_hexrays.cfuncptr_t_get_pseudocode}
  3613220    2.074    0.000    4.131    0.000 ida_ua.py:120(__getitem__)
  3613220    2.058    0.000    2.058    0.000 {built-in method _ida_ua.operands_array___getitem__}
(...many more lines stripped...)

That output can be very useful for me to debug what's going on in your specific environment.

turbocool3r commented 9 months ago

It seems to print after the export finishes which is never the case for me. Let me see if I can just modify the code to end profiling after about 8 hours.

turbocool3r commented 9 months ago

This is the output after about 9500 functions. I've added printing every 10000 functions now, will send when ready.

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 12512769 2212.417    0.000 2212.417    0.000 {method 'execute' of 'sqlite3.Cursor' objects}
     9884  395.973    0.040  395.973    0.040 {built-in method _ida_kernwin.user_cancelled}
    19706  170.026    0.009  170.026    0.009 {built-in method _ida_hexrays.gen_microcode}
144720040   40.918    0.000   40.918    0.000 {method 'sub' of 're.Pattern' objects}
144720040   37.929    0.000   78.850    0.000 diaphora.py:1128(re_sub)
    19704   36.337    0.002   38.301    0.002 {built-in method _ida_hexrays.mba_t__print}
  5160832   30.654    0.000  106.938    0.000 diaphora.py:1169(get_cmp_asm)
     9883   29.764    0.003  667.000    0.067 diaphora_ida.py:2713(read_function)
   712774   26.913    0.000   26.913    0.000 {built-in method _ida_hexrays.mblock_t__print}
    19704   16.768    0.001   70.667    0.004 diaphora_ida.py:2219(get_microcode_bblocks)
     9852   13.812    0.001   13.812    0.001 {built-in method _ida_hexrays.cfuncptr_t_get_pseudocode}
    19706   13.478    0.001  316.696    0.016 diaphora_ida.py:2273(get_microcode)
  8539354   10.563    0.000   17.926    0.000 diaphora_ida.py:2201(get_plain_microcode_line)
  1149953    9.291    0.000   37.882    0.000 diaphora_ida.py:2484(extract_function_constants)
  9842550    9.191    0.000    9.191    0.000 {method 'split' of 're.Pattern' objects}
     9854    8.672    0.001   91.250    0.009 diaphora_ida.py:2670(extract_microcode)
 20420836    8.045    0.000    8.045    0.000 {built-in method _ida_lines.tag_remove}
  1149953    8.012    0.000    8.012    0.000 {built-in method _ida_lines.generate_disasm_line}
 12004139    7.255    0.000   22.351    0.000 idautils.py:98(DataRefsFrom)
   228052    7.165    0.000    7.165    0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
 22758702    6.547    0.000    6.547    0.000 {built-in method _ida_ua.operands_array___getitem__}
  5573601    6.397    0.000    6.397    0.000 {built-in method _ida_gdl.qflow_chart_t_calc_block_type}
 23260244    5.868    0.000   17.796    0.000 ida_idaapi.py:312(_bounded_getitem_iterator)
     9854    5.834    0.001   43.871    0.004 graph_hashes.py:100(calculate)
  5573601    5.798    0.000   13.422    0.000 ida_gdl.py:782(__init__)
 15072041    5.625    0.000    5.625    0.000 {built-in method _ida_xref.xrefblk_t_swiginit}
 22758702    4.821    0.000   11.368    0.000 ida_ua.py:120(__getitem__)
     9842    4.717    0.000   15.953    0.002 diaphora.py:762(save_microcode_instructions)
    19737    4.491    0.000    4.491    0.000 {built-in method _ida_gdl.new_qflow_chart_t}
  4792967    4.206    0.000    4.206    0.000 {method 'index' of 'list' objects}
 15072041    3.939    0.000   10.756    0.000 ida_xref.py:474(__init__)
     9854    3.873    0.000    3.873    0.000 {built-in method _ida_hexrays.decompile_func}
 55806730    3.836    0.000    3.836    0.000 {method 'strip' of 'str' objects}
 30260964    3.696    0.000    3.696    0.000 {method 'split' of 'str' objects}
     9854    3.649    0.000   11.080    0.001 diaphora.py:676(save_instructions_to_database)
 12004139    3.301    0.000    6.947    0.000 ida_xref.py:424(drefs_from)
 33631556    3.195    0.000    3.195    0.000 {method 'find' of 'str' objects}
  3695503    3.037    0.000    3.037    0.000 {built-in method _ida_ua.decode_insn}
 20420836    2.890    0.000   10.935    0.000 ida_lines.py:829(tag_remove)
  1395597    2.770    0.000    7.353    0.000 idc.py:1622(get_operand_value)
  4616840    2.768    0.000    2.958    0.000 diaphora.py:665(get_valid_prop)
     9854    2.760    0.000    3.072    0.000 diaphora_ida.py:2472(extract_function_callers)
  5573601    2.660    0.000   19.022    0.000 ida_gdl.py:850(_getitem)
  9842582    2.645    0.000    3.497    0.000 re.py:289(_compile)
  9842550    2.642    0.000   15.327    0.000 re.py:223(split)
  1149953    2.592    0.000    9.144    0.000 diaphora_ida.py:2609(get_decoded_instruction)
 66135397    2.515    0.000    2.515    0.000 {method 'append' of 'list' objects}
  2568542    2.416    0.000    3.284    0.000 diaphora_ida.py:3833(visit_expr)
  4125474    2.352    0.000    7.830    0.000 idautils.py:60(CodeRefsFrom)
  4531656    2.249    0.000   16.358    0.000 ida_gdl.py:807(succs)
     9883    2.239    0.000    5.575    0.001 diaphora_ida.py:2700(get_microcode_instructions)
    12887    2.216    0.000    2.216    0.000 {built-in method builtins.dir}
  6659360    2.035    0.000    2.035    0.000 {built-in method _ida_ua.insn_t___get_ops__}
    19706    1.970    0.000  109.809    0.006 diaphora.py:1138(get_cmp_asm_lines)
  4889445    1.910    0.000   21.596    0.000 ida_gdl.py:858(__getitem__)
3183064/681649    1.883    0.000    2.229    0.000 tarjan_sort.py:26(visit)
1691460/1377326    1.880    0.000    2.962    0.000 {built-in method builtins.sum}
  3797955    1.866    0.000   12.990    0.000 ida_gdl.py:798(preds)
 11592620    1.783    0.000    3.521    0.000 ida_xref.py:500(get_first_dref_from)
  3695503    1.740    0.000    1.740    0.000 {built-in method _ida_ua.insn_t_swiginit}
 11592620    1.738    0.000    1.738    0.000 {built-in method _ida_xref.get_first_dref_from}
 27234196    1.696    0.000    1.696    0.000 {built-in method builtins.isinstance}
  9842550    1.666    0.000    1.964    0.000 diaphora_ida.py:1027(_print)
  5573601    1.623    0.000    1.623    0.000 {built-in method _ida_gdl.qflow_chart_t___getitem__}
  6659360    1.583    0.000    3.617    0.000 ida_ua.py:816(__get_ops__)
  1149953    1.582    0.000    1.582    0.000 {built-in method _ida_ua.print_insn_mnem}
  1209077    1.521    0.000    1.521    0.000 encoder.py:204(iterencode)
22152812/22152744    1.466    0.000    2.853    0.000 {built-in method builtins.len}
     9854    1.458    0.000  192.281    0.020 diaphora_ida.py:2303(decompile_and_get)
  9199624    1.416    0.000    2.706    0.000 ida_bytes.py:2287(is_forced_operand)
     9854    1.414    0.000 2207.924    0.224 diaphora.py:717(insert_basic_blocks_to_database)
  5573601    1.317    0.000    2.940    0.000 ida_gdl.py:760(__getitem__)
  9199624    1.290    0.000    1.290    0.000 {built-in method _ida_bytes.is_forced_operand}
  2756010    1.289    0.000    2.943    0.000 idautils.py:179(Heads)
  4125474    1.251    0.000    2.941    0.000 ida_xref.py:404(fcrefs_from)
  5573601    1.227    0.000    7.624    0.000 ida_gdl.py:673(calc_block_type)
 15072041    1.192    0.000    1.192    0.000 {built-in method _ida_xref.new_xrefblk_t}
  4919007    1.191    0.000    3.051    0.000 ida_gdl.py:837(<lambda>)
 18677094    1.190    0.000    1.190    0.000 {method 'startswith' of 'str' objects}
  1209077    1.186    0.000    4.187    0.000 __init__.py:183(dumps)
  3695503    1.162    0.000    3.521    0.000 ida_ua.py:650(__init__)
  4919007    1.122    0.000    1.860    0.000 ida_gdl.py:748(size)
  1149953    1.112    0.000    1.112    0.000 {built-in method _ida_nalt.get_switch_info}
     9852    1.098    0.000    5.178    0.001 {built-in method _ida_hexrays.ctree_visitor_t_apply_to}
        1    1.071    1.071 3308.658 3308.658 diaphora_ida.py:1132(do_export)
     9852    1.009    0.000    1.009    0.000 factor.py:28(<listcomp>)
  2299906    0.984    0.000    0.984    0.000 {built-in method _ida_bytes.get_bytes}
  1149953    0.910    0.000    3.963    0.000 graph_hashes.py:95(is_call_insn)
    39445    0.890    0.000    0.890    0.000 {method 'sort' of 'list' objects}
    95909    0.883    0.000    0.883    0.000 {built-in method _ida_typeinf.idc_get_type}
   353550    0.871    0.000    0.871    0.000 {method 'sqrt' of 'decimal.Decimal' objects}
  3449859    0.862    0.000    0.862    0.000 {built-in method _ida_xref.get_first_fcref_from}
     9854    0.857    0.000    0.857    0.000 {built-in method _ida_typeinf.idc_guess_type}
  1149953    0.840    0.000    3.320    0.000 ida_nalt.py:3773(get_switch_info)
  1149953    0.808    0.000    3.291    0.000 diaphora_ida.py:223(diaphora_decode)
     9854    0.774    0.000    0.774    0.000 {built-in method _ida_idp.ph_get_instruc}
  1209077    0.766    0.000    2.441    0.000 encoder.py:182(encode)
     9854    0.766    0.000    3.400    0.000 diaphora_ida.py:2557(extract_function_pseudocode_features)
    29556    0.747    0.000    2.475    0.000 kfuzzy.py:104(_hash)
 17792577    0.742    0.000    0.742    0.000 {method 'isdigit' of 'str' objects}
     9854    0.742    0.000    0.742    0.000 idautils.py:560(<listcomp>)
  3695503    0.741    0.000    3.779    0.000 ida_ua.py:1927(decode_insn)
  4919007    0.738    0.000    0.738    0.000 {built-in method _ida_gdl.qflow_chart_t_size}
  1149953    0.730    0.000    9.752    0.000 idc.py:1486(generate_disasm_line)
   712774    0.710    0.000    0.710    0.000 {method 'splitlines' of 'str' objects}
  1149953    0.664    0.000   12.541    0.000 diaphora_ida.py:2659(extract_line_mnem_disasm)
  2697547    0.653    0.000    1.162    0.000 ida_gdl.py:722(succ)
  1445252    0.642    0.000    0.642    0.000 {built-in method _ida_hexrays.mba_t_get_mblock}
  3695503    0.619    0.000    0.619    0.000 {built-in method _ida_ua.new_insn_t}
  1149953    0.594    0.000    0.594    0.000 {built-in method _ida_idp.is_call_insn}
   629216    0.589    0.000    0.589    0.000 diaphora_ida.py:2419(constant_filter)
    19704    0.581    0.000    0.581    0.000 {built-in method _ida_hexrays.mba_t_build_graph}
  3449859    0.578    0.000    1.440    0.000 ida_xref.py:601(get_first_fcref_from)
   589325    0.577    0.000    0.796    0.000 diaphora_ida.py:3840(visit_insn)
  1199223    0.560    0.000    0.560    0.000 encoder.py:104(__init__)
  2299906    0.550    0.000    1.011    0.000 ida_ua.py:114(__len__)
  3157867    0.547    0.000    1.087    0.000 ida_hexrays.py:18631(_get_op)
  3157867    0.540    0.000    0.540    0.000 {built-in method _ida_hexrays.citem_t__get_op}
  2299906    0.535    0.000    0.535    0.000 {built-in method _ida_bytes.next_head}
1123774/561887    0.525    0.000    0.600    0.000 ida_ida.py:4257(__getattribute__)
  1149953    0.524    0.000    0.524    0.000 {built-in method _ida_nalt.switch_info_t_swiginit}
  1149953    0.523    0.000    3.879    0.000 diaphora_ida.py:2503(extract_function_switches)
   712774    0.519    0.000    0.519    0.000 {built-in method _ida_hexrays.qstring_printer_t_get_s}
  2191898    0.512    0.000    0.921    0.000 ida_gdl.py:731(pred)
  2697547    0.508    0.000    0.508    0.000 {built-in method _ida_gdl.qflow_chart_t_succ}
  2299906    0.484    0.000    1.469    0.000 ida_bytes.py:4312(get_bytes)
  1834109    0.483    0.000    0.863    0.000 ida_gdl.py:706(nsucc)
   228052    0.476    0.000 2203.664    0.010 diaphora.py:647(get_bb_id)
  2299906    0.475    0.000    0.475    0.000 {built-in method _ida_bytes.get_cmt}
   280816    0.475    0.000    0.475    0.000 {built-in method _ida_bytes.get_strlit_contents}
  2299906    0.469    0.000    0.944    0.000 ida_bytes.py:3780(get_cmt)
  2299906    0.461    0.000    0.461    0.000 {built-in method _ida_ua.operands_array___len__}
  2299906    0.455    0.000    0.682    0.000 ida_bytes.py:602(get_item_size)
     9854    0.445    0.000 2238.926    0.227 diaphora.py:1025(save_function)
   996876    0.418    0.000    0.418    0.000 {built-in method _ida_funcs.get_func}
  2191898    0.408    0.000    0.408    0.000 {built-in method _ida_gdl.qflow_chart_t_pred}
  1606057    0.389    0.000    0.692    0.000 ida_gdl.py:714(npred)
  1149953    0.388    0.000    1.059    0.000 ida_nalt.py:2421(__init__)
  1834109    0.380    0.000    0.380    0.000 {built-in method _ida_gdl.qflow_chart_t_nsucc}
  1445252    0.363    0.000    1.005    0.000 ida_hexrays.py:15716(get_mblock)
  2299906    0.361    0.000    0.896    0.000 ida_bytes.py:496(next_head)
     9854    0.356    0.000    0.467    0.000 diaphora_ida.py:2586(extract_function_assembly_features)
   712774    0.349    0.000    0.349    0.000 {built-in method _ida_hexrays.qstring_printer_t_swiginit}
     9852    0.319    0.000    1.328    0.000 factor.py:16(primesbelow)
     9854    0.311    0.000    1.741    0.000 diaphora_ida.py:2528(extract_function_mdindex)
  1606057    0.303    0.000    0.303    0.000 {built-in method _ida_gdl.qflow_chart_t_npred}
  1149953    0.299    0.000   10.051    0.000 idc.py:1514(GetDisasm)
   237907    0.297    0.000    0.477    0.000 diaphora.py:533(get_db)
  1327307    0.297    0.000    0.297    0.000 {method 'join' of 'str' objects}
   888979    0.290    0.000    0.290    0.000 {built-in method _ida_pro.strvec_t___getitem__}
    19707    0.266    0.000    0.266    0.000 {method 'readlines' of '_io._IOBase' objects}
  1149953    0.244    0.000    1.825    0.000 ida_ua.py:1851(print_insn_mnem)
  3886493    0.242    0.000    0.242    0.000 {method 'add' of 'set' objects}
   315569    0.239    0.000    1.036    0.000 diaphora_ida.py:2443(is_constant)
  2531140    0.239    0.000    0.239    0.000 {built-in method builtins.min}
  1149953    0.238    0.000    8.249    0.000 ida_lines.py:759(generate_disasm_line)
  1149953    0.234    0.000    1.347    0.000 ida_nalt.py:2589(get_switch_info)
    93260    0.232    0.000    0.232    0.000 {built-in method _ida_name.demangle_name}
   712774    0.231    0.000    0.686    0.000 ida_hexrays.py:6174(__init__)
  1149953    0.227    0.000    0.822    0.000 ida_idp.py:487(is_call_insn)
  2299906    0.227    0.000    0.227    0.000 {built-in method _ida_bytes.get_item_size}
   712774    0.221    0.000   27.134    0.000 ida_hexrays.py:13931(_print)
   854275    0.214    0.000    0.365    0.000 ida_ua.py:315(__get_addr__)
   935274    0.209    0.000    0.209    0.000 {built-in method _ida_pro.intvec_t___getitem__}
   827011    0.204    0.000    0.350    0.000 ida_ua.py:287(__get_value__)
   996876    0.200    0.000    0.619    0.000 ida_funcs.py:737(get_func)
     9854    0.199    0.000    0.912    0.000 tarjan_sort.py:75(robust_topological_sort)
   712774    0.197    0.000    0.197    0.000 {built-in method _ida_pro.intvec_t_size}
     9854    0.197    0.000    0.240    0.000 tarjan_sort.py:52(topological_sort)
   935274    0.195    0.000    0.404    0.000 ida_pro.py:836(__getitem__)
   712774    0.195    0.000    0.713    0.000 ida_hexrays.py:6183(get_s)
   712774    0.192    0.000    0.905    0.000 ida_hexrays.py:6189(<lambda>)
   888979    0.189    0.000    0.479    0.000 ida_pro.py:2019(__getitem__)
   713718    0.187    0.000    2.535    0.000 ida_gdl.py:855(<genexpr>)
   323988    0.173    0.000    0.284    0.000 diaphora_ida.py:2549(<genexpr>)
   323988    0.168    0.000    1.193    0.000 diaphora_ida.py:2553(<genexpr>)
   712774    0.151    0.000    0.349    0.000 ida_pro.py:651(size)
    29562    0.151    0.000    2.380    0.000 tarjan_sort.py:14(strongly_connected_components)
   854275    0.150    0.000    0.150    0.000 {built-in method _ida_ua.op_t___get_addr__}
  1149953    0.147    0.000    0.147    0.000 {built-in method _ida_nalt.new_switch_info_t}
   827011    0.147    0.000    0.147    0.000 {built-in method _ida_ua.op_t___get_value__}
   237907    0.141    0.000    0.714    0.000 diaphora.py:544(db_cursor)
     9854    0.130    0.000    1.889    0.000 diaphora_ida.py:2620(extract_function_topological_information)
   478407    0.127    0.000    0.211    0.000 ida_ua.py:273(__get_reg_phrase__)
   675615    0.126    0.000    0.250    0.000 ida_xref.py:609(get_next_fcref_from)
   675615    0.124    0.000    0.124    0.000 {built-in method _ida_xref.get_next_fcref_from}
   456104    0.109    0.000    0.205    0.000 ida_bytes.py:642(get_flags)
   228052    0.107    0.000    0.107    0.000 graph_hashes.py:85(get_edges_value)
   712774    0.106    0.000    0.106    0.000 {built-in method _ida_hexrays.new_qstring_printer_t}
   237907    0.097    0.000    0.097    0.000 {method 'cursor' of 'sqlite3.Connection' objects}
   456104    0.096    0.000    0.096    0.000 {built-in method _ida_bytes.get_flags}
   456104    0.094    0.000    0.094    0.000 idc.py:159(is_head)
     9854    0.091    0.000    0.091    0.000 diaphora.py:888(create_function_dictionary)
   237909    0.091    0.000    0.110    0.000 threading.py:1338(current_thread)
     9883    0.088    0.000    0.088    0.000 {built-in method _ida_funcs.get_func_name}
   478407    0.084    0.000    0.084    0.000 {built-in method _ida_ua.op_t___get_reg_phrase__}
594349/594329    0.083    0.000    0.083    0.000 {built-in method builtins.getattr}
   343420    0.075    0.000    0.320    0.000 idautils.py:38(CodeRefsTo)
     9854    0.074    0.000  193.232    0.020 diaphora_ida.py:2361(guess_type)
   237908    0.069    0.000    0.069    0.000 threading.py:1089(ident)
   228052    0.069    0.000    0.069    0.000 graph_hashes.py:73(get_node_value)
     9854    0.069    0.000    3.413    0.000 diaphora.py:1150(get_cmp_pseudo_lines)
   387337    0.067    0.000    0.124    0.000 ida_xref.py:512(get_next_dref_from)
        2    0.066    0.033    0.066    0.033 {method 'close' of 'sqlite3.Connection' objects}
     9852    0.065    0.000    2.611    0.000 kfuzzy.py:246(hash_bytes)
     9852    0.063    0.000    0.066    0.000 kfuzzy.py:218(mix_blocks)
     9854    0.061    0.000 2235.020    0.227 diaphora.py:1005(save_function_to_database)
   217519    0.059    0.000    1.612    0.000 kfuzzy.py:31(modsum)
   387337    0.057    0.000    0.057    0.000 {built-in method _ida_xref.get_next_dref_from}
        2    0.054    0.027    0.054    0.027 {method 'commit' of 'sqlite3.Connection' objects}
   171983    0.054    0.000    0.112    0.000 ida_xref.py:374(crefs_to)
     9852    0.053    0.000    1.448    0.000 diaphora_ida.py:3826(__init__)
   280816    0.053    0.000    0.528    0.000 ida_bytes.py:4338(get_strlit_contents)
    19708    0.052    0.000    0.052    0.000 {built-in method _hashlib.openssl_md5}
    95909    0.051    0.000    0.975    0.000 idc.py:4971(get_type)
    49270    0.049    0.000    0.178    0.000 diaphora_ida.py:2548(<genexpr>)
    19737    0.046    0.000    4.597    0.000 ida_gdl.py:821(__init__)
   171437    0.046    0.000    0.097    0.000 ida_xref.py:384(fcrefs_to)
    19708    0.045    0.000    0.045    0.000 {built-in method _ida_nalt.get_imagebase}
    95909    0.040    0.000    0.924    0.000 ida_typeinf.py:9983(idc_get_type)
     9883    0.039    0.000    0.039    0.000 diaphora_ida.py:1049(clear_pseudo_fields)
    19708    0.037    0.000    0.037    0.000 {method 'join' of 'bytes' objects}
     9854    0.035    0.000    0.035    0.000 {built-in method _ida_funcs.get_func_cmt}
    93260    0.034    0.000    0.267    0.000 ida_name.py:1141(demangle_name)
    29562    0.034    0.000    0.073    0.000 ida_gdl.py:854(__iter__)
    19737    0.033    0.000    0.033    0.000 {built-in method _ida_gdl.qflow_chart_t_swiginit}
     9852    0.033    0.000    0.033    0.000 {built-in method _ida_hexrays.new_ctree_visitor_t}
    19737    0.032    0.000    0.051    0.000 os.py:674(__getitem__)
    41224    0.032    0.000    0.046    0.000 idautils.py:199(Functions)
    19704    0.030    0.000   38.331    0.002 ida_hexrays.py:15680(_print)
     9883    0.029    0.000    0.245    0.000 diaphora_ida.py:2460(get_function_names)
    29591    0.029    0.000    0.106    0.000 idc.py:2864(get_func_attr)
    19737    0.027    0.000    4.551    0.000 ida_gdl.py:626(__init__)
     9854    0.026    0.000    0.057    0.000 idc.py:2208(get_segm_start)
   162129    0.026    0.000    0.026    0.000 {built-in method _ida_xref.get_next_cref_to}
    19706    0.026    0.000    0.026    0.000 {built-in method _ida_hexrays.new_vd_printer_t}
    19708    0.026    0.000    0.085    0.000 diaphora_ida.py:3091(get_base_address)
    19706    0.026    0.000  170.052    0.009 ida_hexrays.py:22282(gen_microcode)
    19706    0.025    0.000    0.048    0.000 ida_hexrays.py:16856(__init__)
    29591    0.025    0.000    0.025    0.000 {built-in method builtins.hasattr}
    29591    0.024    0.000    0.056    0.000 idc.py:89(_IDC_GetAttr)
    19706    0.023    0.000    0.023    0.000 {built-in method _ida_hexrays.new_mba_ranges_t}
    19706    0.023    0.000    0.085    0.000 diaphora_ida.py:1018(__init__)
     9854    0.022    0.000    3.921    0.000 ida_hexrays.py:22262(decompile_func)
   161583    0.022    0.000    0.022    0.000 {built-in method _ida_xref.get_next_fcref_to}
    19706    0.022    0.000    0.062    0.000 ida_hexrays.py:6084(__init__)
   162129    0.021    0.000    0.048    0.000 ida_xref.py:588(get_next_cref_to)
   161583    0.021    0.000    0.043    0.000 ida_xref.py:626(get_next_fcref_to)
     9854    0.021    0.000    0.021    0.000 {built-in method _ida_segment.getseg}
     9854    0.021    0.000    1.546    0.000 idautils.py:558(GetInstructionList)
    19706    0.020    0.000    0.062    0.000 ida_hexrays.py:14923(__init__)
     9852    0.020    0.000    0.067    0.000 ida_hexrays.py:17980(__init__)
   237909    0.020    0.000    0.020    0.000 {built-in method _thread.get_ident}
    19737    0.020    0.000    0.087    0.000 os.py:771(getenv)
     9854    0.019    0.000    0.082    0.000 idc.py:3025(get_func_cmt)
    19706    0.018    0.000    0.018    0.000 {built-in method _ida_hexrays.mba_ranges_t_swiginit}
    19708    0.017    0.000    0.017    0.000 {method 'hexdigest' of '_hashlib.HASH' objects}
    29556    0.017    0.000    0.029    0.000 base64.py:51(b64encode)
     9884    0.017    0.000  395.990    0.040 ida_kernwin.py:7289(user_cancelled)
   216672    0.017    0.000    0.017    0.000 {built-in method builtins.chr}
    19737    0.017    0.000    0.067    0.000 _collections_abc.py:761(get)
    19706    0.017    0.000    0.040    0.000 ida_hexrays.py:8712(__init__)
     9854    0.016    0.000    0.016    0.000 {built-in method _ida_hexrays.cfuncptr_t___deref__}
     9854    0.016    0.000    0.016    0.000 tarjan_sort.py:60(<listcomp>)
   100909    0.016    0.000    0.016    0.000 {method 'decode' of 'bytes' objects}
    19706    0.015    0.000    0.015    0.000 {built-in method _ida_hexrays.hexrays_failure_t_swiginit}
    19706    0.015    0.000    0.015    0.000 {built-in method _ida_hexrays.mlist_t_swiginit}
    19708    0.014    0.000    0.059    0.000 ida_nalt.py:3420(get_imagebase)
    19706    0.014    0.000    0.014    0.000 {built-in method _ida_hexrays.vd_printer_t_swiginit}
     9852    0.014    0.000    0.014    0.000 {built-in method _ida_pro.strvec_t_size}
     9852    0.013    0.000    0.013    0.000 {built-in method _ida_hexrays.ctree_visitor_t_swiginit}
   188529    0.013    0.000    0.013    0.000 {method 'pop' of 'list' objects}
    19704    0.013    0.000    0.594    0.000 ida_hexrays.py:15612(build_graph)
    29556    0.012    0.000    0.012    0.000 {built-in method binascii.b2a_base64}
   237907    0.012    0.000    0.012    0.000 {method 'close' of 'sqlite3.Cursor' objects}
    49293    0.012    0.000    0.012    0.000 {method 'encode' of 'str' objects}
     9854    0.012    0.000    0.877    0.000 idc.py:5007(guess_type)
    19853    0.011    0.000    0.041    0.000 idautils.py:81(DataRefsTo)
     9883    0.011    0.000    0.099    0.000 ida_funcs.py:1026(get_func_name)
     9854    0.011    0.000    0.011    0.000 {built-in method _ida_hexrays.init_hexrays_plugin}
     9854    0.011    0.000    0.011    0.000 {built-in method _ida_segment.is_spec_ea}
     9852    0.011    0.000    0.025    0.000 ida_pro.py:1881(size)
       24    0.011    0.000    0.011    0.000 {built-in method _ida_kernwin.replace_wait_box}
     9852    0.010    0.000    5.189    0.001 ida_hexrays.py:17993(apply_to)
     9854    0.010    0.000    0.784    0.000 ida_idp.py:5846(ph_get_instruc)
     9852    0.010    0.000    0.010    0.000 {built-in method _ida_hexrays.restore_user_cmts}
    19737    0.010    0.000    0.018    0.000 os.py:754(encode)
    41223    0.010    0.000    0.010    0.000 {built-in method _ida_funcs.get_next_func}
     9854    0.010    0.000    0.030    0.000 ida_segment.py:1015(getseg)
    19853    0.010    0.000    0.022    0.000 ida_xref.py:414(drefs_to)
     9854    0.009    0.000    0.020    0.000 ida_segment.py:678(is_spec_ea)
     9854    0.009    0.000    0.025    0.000 ida_hexrays.py:2045(__deref__)
     9854    0.009    0.000    0.020    0.000 ida_hexrays.py:4116(init_hexrays_plugin)
     9854    0.009    0.000    0.866    0.000 ida_typeinf.py:9975(idc_guess_type)
     9852    0.008    0.000   13.821    0.001 ida_hexrays.py:2273(get_pseudocode)
    19706    0.008    0.000    0.008    0.000 {built-in method _ida_hexrays.new_hexrays_failure_t}
        1    0.008    0.008    0.008    0.008 {built-in method _ida_kernwin.hide_wait_box}
    19706    0.008    0.000    0.008    0.000 {built-in method _ida_hexrays.new_mlist_t}
     9854    0.008    0.000    3.935    0.000 diaphora_ida.py:2197(do_decompile)
     9852    0.006    0.000    0.016    0.000 ida_hexrays.py:21738(restore_user_cmts)
     9854    0.006    0.000    0.006    0.000 {built-in method _ida_xref.get_first_cref_to}
     9865    0.006    0.000    0.047    0.000 {method 'extend' of 'list' objects}
     9854    0.006    0.000    3.927    0.000 ida_hexrays.py:26070(decompile)
     9854    0.005    0.000    0.041    0.000 ida_funcs.py:841(get_func_cmt)
    41223    0.005    0.000    0.015    0.000 ida_funcs.py:820(get_next_func)
    29556    0.005    0.000    0.005    0.000 {method 'strip' of 'bytes' objects}
     9854    0.005    0.000    0.005    0.000 {built-in method _ida_xref.get_first_dref_to}
     9854    0.004    0.000    0.011    0.000 ida_xref.py:575(get_first_cref_to)
     9854    0.004    0.000    0.004    0.000 {built-in method _ida_xref.get_first_fcref_to}
     9854    0.004    0.000    0.009    0.000 ida_xref.py:525(get_first_dref_to)
     9854    0.004    0.000    0.008    0.000 ida_xref.py:618(get_first_fcref_to)
        1    0.003    0.003 3309.006 3309.006 diaphora_ida.py:1227(export)
      278    0.002    0.000    0.002    0.000 {built-in method _ida_xref.calc_switch_cases}
     9854    0.002    0.000    0.002    0.000 {method 'insert' of 'list' objects}
     9854    0.002    0.000    0.002    0.000 {method 'keys' of 'dict' objects}
     9854    0.002    0.000    0.002    0.000 graph_hashes.py:70(__init__)
     9999    0.002    0.000    0.004    0.000 ida_xref.py:535(get_next_dref_to)
     9999    0.002    0.000    0.002    0.000 {built-in method _ida_xref.get_next_dref_to}
     7925    0.002    0.000    0.002    0.000 {built-in method _ida_pro.int64vec_t___getitem__}
     7925    0.002    0.000    0.004    0.000 ida_pro.py:1326(__getitem__)
     9854    0.001    0.000    0.001    0.000 {method 'remove' of 'list' objects}
     3004    0.001    0.000    0.001    0.000 {built-in method _ida_xref.casevec_t___getitem__}
        1    0.001    0.001    0.001    0.001 {built-in method posix.remove}
     3004    0.001    0.000    0.001    0.000 {built-in method _ida_pro.int64vec_t_size}
     9887    0.001    0.000    0.001    0.000 {method 'items' of 'dict' objects}
       34    0.001    0.000    0.001    0.000 sre_compile.py:276(_optimize_charset)
     3004    0.001    0.000    0.002    0.000 ida_pro.py:1141(size)
     3004    0.001    0.000    0.002    0.000 ida_xref.py:880(__getitem__)
       33    0.001    0.000    0.001    0.000 sre_parse.py:493(_parse)
    68/33    0.000    0.000    0.001    0.000 sre_compile.py:71(_compile)
        2    0.000    0.000    0.001    0.001 linecache.py:36(getlines)
        2    0.000    0.000    0.000    0.000 {built-in method io.open}
      555    0.000    0.000    0.000    0.000 sre_parse.py:233(__next)
      278    0.000    0.000    0.000    0.000 ida_nalt.py:2359(get_jtable_size)
    68/33    0.000    0.000    0.000    0.000 sre_parse.py:174(getwidth)
      278    0.000    0.000    0.000    0.000 ida_xref.py:687(size)
      278    0.000    0.000    0.003    0.000 ida_xref.py:114(calc_switch_cases)
      437    0.000    0.000    0.000    0.000 sre_parse.py:254(get)
      278    0.000    0.000    0.000    0.000 {built-in method _ida_xref.casevec_t_size}
      442    0.000    0.000    0.000    0.000 sre_parse.py:164(__getitem__)
       33    0.000    0.000    0.000    0.000 sre_compile.py:536(_compile_info)
       16    0.000    0.000    0.000    0.000 {built-in method _ida_kernwin.msg}
      278    0.000    0.000    0.000    0.000 {built-in method _ida_nalt.switch_info_t_get_jtable_size}
       33    0.000    0.000    0.004    0.000 sre_compile.py:759(compile)
       33    0.000    0.000    0.002    0.000 sre_parse.py:937(parse)
        3    0.000    0.000    0.000    0.000 {built-in method posix.stat}
        1    0.000    0.000    0.001    0.001 traceback.py:321(extract)
       33    0.000    0.000    0.001    0.000 sre_parse.py:435(_parse_sub)
      190    0.000    0.000    0.000    0.000 sre_parse.py:249(match)
       34    0.000    0.000    0.000    0.000 sre_compile.py:249(_compile_charset)
      197    0.000    0.000    0.000    0.000 sre_parse.py:172(append)
       29    0.000    0.000    0.000    0.000 diaphora_ida.py:145(debug_refresh)
       24    0.000    0.000    0.011    0.000 ida_kernwin.py:7799(replace_wait_box)
       33    0.000    0.000    0.002    0.000 sre_compile.py:598(_code)
       33    0.000    0.000    0.000    0.000 enum.py:977(__and__)
       33    0.000    0.000    0.000    0.000 sre_compile.py:461(_get_literal_prefix)
       33    0.000    0.000    0.000    0.000 sre_parse.py:224(__init__)
        3    0.000    0.000    0.000    0.000 {method 'execute' of 'sqlite3.Connection' objects}
      101    0.000    0.000    0.000    0.000 sre_parse.py:286(tell)
      158    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
       66    0.000    0.000    0.000    0.000 enum.py:358(__call__)
        1    0.000    0.000    0.000    0.000 {built-in method _ida_kernwin.show_wait_box}
        4    0.000    0.000    0.000    0.000 {built-in method time.asctime}
      138    0.000    0.000    0.000    0.000 sre_parse.py:160(__len__)
       96    0.000    0.000    0.000    0.000 {built-in method builtins.divmod}
       68    0.000    0.000    0.000    0.000 sre_parse.py:111(__init__)
       29    0.000    0.000    0.000    0.000 sre_compile.py:492(_get_charset_prefix)
        1    0.000    0.000    0.066    0.066 diaphora.py:553(db_close)
       66    0.000    0.000    0.000    0.000 enum.py:670(__new__)
        3    0.000    0.000    0.000    0.000 sre_compile.py:413(<listcomp>)
       62    0.000    0.000    0.000    0.000 sre_compile.py:453(_get_iscased)
       35    0.000    0.000    0.000    0.000 sre_compile.py:423(_simple)
       33    0.000    0.000    0.000    0.000 sre_parse.py:921(fix_flags)
       66    0.000    0.000    0.000    0.000 sre_compile.py:595(isstring)
       33    0.000    0.000    0.000    0.000 sre_parse.py:432(_uniq)
        1    0.000    0.000    0.001    0.001 traceback.py:468(__init__)
       33    0.000    0.000    0.000    0.000 {built-in method _sre.compile}
        8    0.000    0.000    0.000    0.000 {built-in method builtins.print}
      332    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
       32    0.000    0.000    0.000    0.000 types.py:171(__get__)
       33    0.000    0.000    0.000    0.000 sre_parse.py:76(__init__)
      228    0.000    0.000    0.000    0.000 {built-in method _sre.unicode_iscased}
        6    0.000    0.000    0.000    0.000 sre_parse.py:355(_escape)
       66    0.000    0.000    0.000    0.000 sre_parse.py:81(groups)
       16    0.000    0.000    0.000    0.000 init.py:76(write)
       16    0.000    0.000    0.000    0.000 ida_kernwin.py:197(msg)
        1    0.000    0.000    0.000    0.000 tokenize.py:388(open)
        1    0.000    0.000    0.000    0.000 tokenize.py:295(detect_encoding)
       17    0.000    0.000    0.000    0.000 codecs.py:319(decode)
       32    0.000    0.000    0.004    0.000 re.py:250(compile)
        1    0.000    0.000    0.000    0.000 {method 'readline' of '_io.BufferedReader' objects}
       17    0.000    0.000    0.000    0.000 {built-in method _codecs.utf_8_decode}
        4    0.000    0.000    0.000    0.000 diaphora_ida.py:120(log)
       25    0.000    0.000    0.000    0.000 {built-in method time.monotonic}
       33    0.000    0.000    0.000    0.000 {built-in method fromkeys}
        1    0.000    0.000    0.000    0.000 traceback.py:388(format)
        1    0.000    0.000    0.001    0.001 linecache.py:91(updatecache)
        2    0.000    0.000    0.000    0.000 {method '__exit__' of '_io._IOBase' objects}
        1    0.000    0.000    0.001    0.001 traceback.py:87(print_exception)
        1    0.000    0.000    0.000    0.000 {built-in method _ida_funcs.get_fchunk}
        3    0.000    0.000    0.000    0.000 sre_compile.py:411(_mk_bitmap)
        1    0.000    0.000    0.008    0.008 ida_kernwin.py:956(hide_wait_box)
       35    0.000    0.000    0.000    0.000 sre_parse.py:168(__setitem__)
       29    0.000    0.000    0.000    0.000 {built-in method builtins.repr}
        3    0.000    0.000    0.000    0.000 sre_compile.py:416(_bytes_to_codes)
        4    0.000    0.000    0.000    0.000 sre_compile.py:432(_generate_overlap_table)
      123    0.000    0.000    0.000    0.000 {built-in method _sre.unicode_tolower}
        1    0.000    0.000    0.000    0.000 linecache.py:63(checkcache)
       32    0.000    0.000    0.000    0.000 enum.py:792(value)
        5    0.000    0.000    0.000    0.000 traceback.py:603(format)
       32    0.000    0.000    0.000    0.000 {built-in method builtins.any}
        4    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        1    0.000    0.000    0.000    0.000 codecs.py:309(__init__)
        8    0.000    0.000    0.001    0.000 traceback.py:285(line)
        2    0.000    0.000    0.000    0.000 {method 'match' of 're.Pattern' objects}
        1    0.000    0.000    0.000    0.000 {built-in method _ida_funcs.get_next_fchunk}
        1    0.000    0.000    0.001    0.001 traceback.py:161(print_exc)
        2    0.000    0.000    0.001    0.001 linecache.py:26(getline)
        1    0.000    0.000    0.000    0.000 tokenize.py:325(find_cookie)
        1    0.000    0.000    0.000    0.000 genericpath.py:16(exists)
        2    0.000    0.000    0.000    0.000 traceback.py:548(format_exception_only)
        1    0.000    0.000    0.000    0.000 {method 'close' of '_io.BufferedWriter' objects}
        3    0.000    0.000    0.000    0.000 {method 'tolist' of 'memoryview' objects}
        1    0.000    0.000    0.000    0.000 ida_kernwin.py:926(show_wait_box)
       12    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 traceback.py:145(_format_final_exc_line)
        2    0.000    0.000    0.000    0.000 linecache.py:158(lazycache)
        3    0.000    0.000    0.000    0.000 {method 'translate' of 'bytearray' objects}
        3    0.000    0.000    0.000    0.000 traceback.py:305(walk_tb)
        2    0.000    0.000    0.000    0.000 traceback.py:243(__init__)
        1    0.000    0.000    0.000    0.000 ida_funcs.py:1165(get_fchunk)
        3    0.000    0.000    0.000    0.000 {method 'cast' of 'memoryview' objects}
        2    0.000    0.000    0.000    0.000 {built-in method sys.exc_info}
       10    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 tokenize.py:319(read_or_stop)
        1    0.000    0.000    0.000    0.000 traceback.py:531(_load_lines)
        2    0.000    0.000    0.000    0.000 traceback.py:153(_some_str)
        1    0.000    0.000    0.000    0.000 {method 'startswith' of 'bytes' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.id}
        1    0.000    0.000    0.000    0.000 ida_funcs.py:1216(get_next_fchunk)
        1    0.000    0.000    0.000    0.000 diaphora.py:468(load_hooks)
        1    0.000    0.000    0.000    0.000 codecs.py:260(__init__)
        1    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'seek' of '_io.BufferedReader' objects}
        1    0.000    0.000    0.000    0.000 {method 'endswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
joxeankoret commented 9 months ago

It seems to, indeed, be something with SQLite3:

 12512769 2212.417    0.000 2212.417    0.000 {method 'execute' of 'sqlite3.Cursor' objects}

This value means that 2,212 seconds have been spent inserting rows. I need to investigate this...

Thanks for the help!

joxeankoret commented 9 months ago

Uhm... I'm suspecting, also, that what you think is taking so long (the insertion of the last rows) is actually because Diaphora is searching for compilation units. I'm testing it and also considering an option to disable it...

This is an example of what I mean:

imagen

The last 68 rows aren't taking so long, it's the code for searching compilation units.

turbocool3r commented 9 months ago

Hmm, that might be it, though last time I killed the thing it was at ~36000 functions out of ~40000. It'll be easier to tell once I get the profiling data.

turbocool3r commented 9 months ago

It's similar for ~18128 functions:

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 12631926 2250.658    0.000 2250.658    0.000 {method 'execute' of 'sqlite3.Cursor' objects}
    10000  394.660    0.039  394.660    0.039 {built-in method _ida_kernwin.user_cancelled}
    19938  157.615    0.008  157.615    0.008 {built-in method _ida_hexrays.gen_microcode}
146060012   41.160    0.000   41.160    0.000 {method 'sub' of 're.Pattern' objects}
146060012   37.974    0.000   79.134    0.000 diaphora.py:1128(re_sub)
    19936   36.873    0.002   38.815    0.002 {built-in method _ida_hexrays.mba_t__print}
  5208597   30.784    0.000  107.335    0.000 diaphora.py:1169(get_cmp_asm)
     9999   29.996    0.003  658.750    0.066 diaphora_ida.py:2716(read_function)
   720366   26.992    0.000   26.992    0.000 {built-in method _ida_hexrays.mblock_t__print}
    19969   17.809    0.001   17.809    0.001 {built-in method _ida_gdl.new_qflow_chart_t}
    19936   16.872    0.001   71.082    0.004 diaphora_ida.py:2222(get_microcode_bblocks)
    19938   13.542    0.001  305.436    0.015 diaphora_ida.py:2276(get_microcode)
  8619624   10.706    0.000   18.130    0.000 diaphora_ida.py:2204(get_plain_microcode_line)
  1160074    9.342    0.000   38.187    0.000 diaphora_ida.py:2487(extract_function_constants)
  9937042    9.237    0.000    9.237    0.000 {method 'split' of 're.Pattern' objects}
     9970    8.716    0.001   91.618    0.009 diaphora_ida.py:2673(extract_microcode)
 20615674    8.110    0.000    8.110    0.000 {built-in method _ida_lines.tag_remove}
  1160074    8.036    0.000    8.036    0.000 {built-in method _ida_lines.generate_disasm_line}
 12107226    7.253    0.000   22.486    0.000 idautils.py:98(DataRefsFrom)
   230695    7.158    0.000    7.158    0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
 22959803    6.569    0.000    6.569    0.000 {built-in method _ida_ua.operands_array___getitem__}
  5630667    6.412    0.000    6.412    0.000 {built-in method _ida_gdl.qflow_chart_t_calc_block_type}
 23470347    5.935    0.000   17.942    0.000 ida_idaapi.py:312(_bounded_getitem_iterator)
     9970    5.819    0.001   55.795    0.006 graph_hashes.py:100(calculate)
  5630667    5.801    0.000   13.454    0.000 ida_gdl.py:782(__init__)
 15203610    5.694    0.000    5.694    0.000 {built-in method _ida_xref.xrefblk_t_swiginit}
  1160074    5.289    0.000    5.289    0.000 {built-in method _ida_idp.is_call_insn}
     9958    4.888    0.000   16.225    0.002 diaphora.py:762(save_microcode_instructions)
 22959803    4.863    0.000   11.432    0.000 ida_ua.py:120(__getitem__)
  4837178    4.238    0.000    4.238    0.000 {method 'index' of 'list' objects}
 15203610    3.972    0.000   10.859    0.000 ida_xref.py:474(__init__)
 56330391    3.865    0.000    3.865    0.000 {method 'strip' of 'str' objects}
 30542332    3.723    0.000    3.723    0.000 {method 'split' of 'str' objects}
     9970    3.666    0.000   11.125    0.001 diaphora.py:676(save_instructions_to_database)
 12107226    3.330    0.000    7.009    0.000 ida_xref.py:424(drefs_from)
 33947625    3.199    0.000    3.199    0.000 {method 'find' of 'str' objects}
  3728204    3.055    0.000    3.055    0.000 {built-in method _ida_ua.decode_insn}
 20615674    2.900    0.000   11.011    0.000 ida_lines.py:829(tag_remove)
     9970    2.818    0.000    3.136    0.000 diaphora_ida.py:2475(extract_function_callers)
  1408056    2.790    0.000    7.416    0.000 idc.py:1622(get_operand_value)
  9937074    2.774    0.000    3.622    0.000 re.py:289(_compile)
  4661241    2.758    0.000    2.951    0.000 diaphora.py:665(get_valid_prop)
  5630667    2.726    0.000   19.154    0.000 ida_gdl.py:850(_getitem)
  9937042    2.642    0.000   15.501    0.000 re.py:223(split)
  1160074    2.621    0.000    9.242    0.000 diaphora_ida.py:2612(get_decoded_instruction)
 66755506    2.526    0.000    2.526    0.000 {method 'append' of 'list' objects}
  2593591    2.427    0.000    3.302    0.000 diaphora_ida.py:3836(visit_expr)
  4163967    2.363    0.000    7.890    0.000 idautils.py:60(CodeRefsFrom)
     9999    2.260    0.000    5.606    0.001 diaphora_ida.py:2703(get_microcode_instructions)
  4577097    2.249    0.000   16.473    0.000 ida_gdl.py:807(succs)
    13003    2.215    0.000    2.215    0.000 {built-in method builtins.dir}
  6718767    2.099    0.000    2.099    0.000 {built-in method _ida_ua.insn_t___get_ops__}
    19938    2.018    0.000  110.253    0.006 diaphora.py:1138(get_cmp_asm_lines)
  4938582    1.952    0.000   21.763    0.000 ida_gdl.py:858(__getitem__)
3217354/689549    1.923    0.000    2.290    0.000 tarjan_sort.py:26(visit)
1707826/1389877    1.884    0.000    2.974    0.000 {built-in method builtins.sum}
  3835108    1.872    0.000   13.073    0.000 ida_gdl.py:798(preds)
 11693478    1.795    0.000    3.556    0.000 ida_xref.py:500(get_first_dref_from)
 11693478    1.761    0.000    1.761    0.000 {built-in method _ida_xref.get_first_dref_from}
  3728204    1.756    0.000    1.756    0.000 {built-in method _ida_ua.insn_t_swiginit}
 27488370    1.702    0.000    1.702    0.000 {built-in method builtins.isinstance}
  9937042    1.651    0.000    1.942    0.000 diaphora_ida.py:1027(_print)
  5630667    1.642    0.000    1.642    0.000 {built-in method _ida_gdl.qflow_chart_t___getitem__}
  6718767    1.592    0.000    3.691    0.000 ida_ua.py:816(__get_ops__)
  1160074    1.589    0.000    1.589    0.000 {built-in method _ida_ua.print_insn_mnem}
  1219894    1.496    0.000    1.496    0.000 encoder.py:204(iterencode)
 22365236    1.464    0.000    2.847    0.000 {built-in method builtins.len}
     9970    1.457    0.000  158.191    0.016 diaphora_ida.py:2306(decompile_and_get)
  9280592    1.430    0.000    2.734    0.000 ida_bytes.py:2287(is_forced_operand)
     9970    1.391    0.000 2246.387    0.225 diaphora.py:717(insert_basic_blocks_to_database)
  5630667    1.333    0.000    2.975    0.000 ida_gdl.py:760(__getitem__)
  9280592    1.304    0.000    1.304    0.000 {built-in method _ida_bytes.is_forced_operand}
  2781538    1.290    0.000    2.938    0.000 idautils.py:179(Heads)
  4163967    1.260    0.000    2.966    0.000 ida_xref.py:404(fcrefs_from)
  5630667    1.240    0.000    7.652    0.000 ida_gdl.py:673(calc_block_type)
 18895546    1.203    0.000    1.203    0.000 {method 'startswith' of 'str' objects}
 15203610    1.193    0.000    1.193    0.000 {built-in method _ida_xref.new_xrefblk_t}
  4968492    1.190    0.000    3.066    0.000 ida_gdl.py:837(<lambda>)
  1219894    1.184    0.000    4.169    0.000 __init__.py:183(dumps)
  3728204    1.165    0.000    3.549    0.000 ida_ua.py:650(__init__)
     9968    1.135    0.000    5.236    0.001 {built-in method _ida_hexrays.ctree_visitor_t_apply_to}
  4968492    1.133    0.000    1.876    0.000 ida_gdl.py:748(size)
  1160074    1.120    0.000    1.120    0.000 {built-in method _ida_nalt.get_switch_info}
        1    1.048    1.048 3337.684 3337.684 diaphora_ida.py:1132(do_export)
     9968    1.021    0.000    1.021    0.000 factor.py:28(<listcomp>)
  2320148    0.991    0.000    0.991    0.000 {built-in method _ida_bytes.get_bytes}
  1160074    0.905    0.000    8.653    0.000 graph_hashes.py:95(is_call_insn)
    39909    0.897    0.000    0.897    0.000 {method 'sort' of 'list' objects}
   357829    0.871    0.000    0.871    0.000 {method 'sqrt' of 'decimal.Decimal' objects}
  3480222    0.868    0.000    0.868    0.000 {built-in method _ida_xref.get_first_fcref_from}
    96919    0.865    0.000    0.865    0.000 {built-in method _ida_typeinf.idc_get_type}
  1160074    0.848    0.000    3.347    0.000 ida_nalt.py:3773(get_switch_info)
  1160074    0.822    0.000    3.332    0.000 diaphora_ida.py:223(diaphora_decode)
  1219894    0.780    0.000    2.427    0.000 encoder.py:182(encode)
     9970    0.766    0.000    3.391    0.000 diaphora_ida.py:2560(extract_function_pseudocode_features)
     9970    0.756    0.000    0.756    0.000 {built-in method _ida_idp.ph_get_instruc}
     9970    0.749    0.000    0.749    0.000 idautils.py:560(<listcomp>)
  3728204    0.744    0.000    3.800    0.000 ida_ua.py:1927(decode_insn)
    29904    0.744    0.000    2.474    0.000 kfuzzy.py:104(_hash)
  4968492    0.743    0.000    0.743    0.000 {built-in method _ida_gdl.qflow_chart_t_size}
 17964664    0.741    0.000    0.741    0.000 {method 'isdigit' of 'str' objects}
  1160074    0.736    0.000    9.791    0.000 idc.py:1486(generate_disasm_line)
   720366    0.716    0.000    0.716    0.000 {method 'splitlines' of 'str' objects}
     9970    0.697    0.000    0.697    0.000 {built-in method _ida_typeinf.idc_guess_type}
  1460668    0.688    0.000    0.688    0.000 {built-in method _ida_hexrays.mba_t_get_mblock}
  1160074    0.670    0.000   12.597    0.000 diaphora_ida.py:2662(extract_line_mnem_disasm)
  2724938    0.660    0.000    1.172    0.000 ida_gdl.py:722(succ)
  3728204    0.628    0.000    0.628    0.000 {built-in method _ida_ua.new_insn_t}
   634675    0.593    0.000    0.593    0.000 diaphora_ida.py:2422(constant_filter)
  3480222    0.586    0.000    1.453    0.000 ida_xref.py:601(get_first_fcref_from)
   595799    0.581    0.000    0.800    0.000 diaphora_ida.py:3843(visit_insn)
     9968    0.573    0.000    0.573    0.000 {built-in method _ida_hexrays.restore_user_cmts}
    19936    0.561    0.000    0.561    0.000 {built-in method _ida_hexrays.mba_t_build_graph}
  1209924    0.557    0.000    0.557    0.000 encoder.py:104(__init__)
  2320148    0.556    0.000    1.021    0.000 ida_ua.py:114(__len__)
  3189390    0.553    0.000    1.093    0.000 ida_hexrays.py:18631(_get_op)
  3189390    0.541    0.000    0.541    0.000 {built-in method _ida_hexrays.citem_t__get_op}
  2320148    0.537    0.000    0.537    0.000 {built-in method _ida_bytes.next_head}
  1160074    0.529    0.000    0.529    0.000 {built-in method _ida_nalt.switch_info_t_swiginit}
  1160074    0.523    0.000    3.905    0.000 diaphora_ida.py:2506(extract_function_switches)
1136558/568279    0.520    0.000    0.589    0.000 ida_ida.py:4257(__getattribute__)
   720366    0.519    0.000    0.519    0.000 {built-in method _ida_hexrays.qstring_printer_t_get_s}
  2213644    0.515    0.000    0.925    0.000 ida_gdl.py:731(pred)
  2724938    0.513    0.000    0.513    0.000 {built-in method _ida_gdl.qflow_chart_t_succ}
  2320148    0.488    0.000    1.479    0.000 ida_bytes.py:4312(get_bytes)
  1852159    0.486    0.000    0.868    0.000 ida_gdl.py:706(nsucc)
  2320148    0.480    0.000    0.480    0.000 {built-in method _ida_bytes.get_cmt}
   230695    0.474    0.000 2242.169    0.010 diaphora.py:647(get_bb_id)
   282424    0.474    0.000    0.474    0.000 {built-in method _ida_bytes.get_strlit_contents}
  2320148    0.471    0.000    0.951    0.000 ida_bytes.py:3780(get_cmt)
  2320148    0.465    0.000    0.465    0.000 {built-in method _ida_ua.operands_array___len__}
  2320148    0.460    0.000    0.686    0.000 ida_bytes.py:602(get_item_size)
     9970    0.431    0.000    0.431    0.000 {built-in method _ida_hexrays.decompile_func}
     9970    0.429    0.000 2277.516    0.228 diaphora.py:1025(save_function)
  2213644    0.410    0.000    0.410    0.000 {built-in method _ida_gdl.qflow_chart_t_pred}
  1004824    0.409    0.000    0.409    0.000 {built-in method _ida_funcs.get_func}
  1160074    0.392    0.000    1.067    0.000 ida_nalt.py:2421(__init__)
  1621464    0.392    0.000    0.696    0.000 ida_gdl.py:714(npred)
  1852159    0.382    0.000    0.382    0.000 {built-in method _ida_gdl.qflow_chart_t_nsucc}
  1460668    0.365    0.000    1.053    0.000 ida_hexrays.py:15716(get_mblock)
  2320148    0.361    0.000    0.898    0.000 ida_bytes.py:496(next_head)
     9970    0.357    0.000    0.469    0.000 diaphora_ida.py:2589(extract_function_assembly_features)
   720366    0.345    0.000    0.345    0.000 {built-in method _ida_hexrays.qstring_printer_t_swiginit}
     9968    0.316    0.000    1.338    0.000 factor.py:16(primesbelow)
   240665    0.306    0.000    0.481    0.000 diaphora.py:533(get_db)
  1621464    0.305    0.000    0.305    0.000 {built-in method _ida_gdl.qflow_chart_t_npred}
  1160074    0.302    0.000   10.093    0.000 idc.py:1514(GetDisasm)
  1339514    0.297    0.000    0.297    0.000 {method 'join' of 'str' objects}
   898934    0.291    0.000    0.291    0.000 {built-in method _ida_pro.strvec_t___getitem__}
     9970    0.281    0.000    1.706    0.000 diaphora_ida.py:2531(extract_function_mdindex)
    19938    0.268    0.000    0.268    0.000 {method 'readlines' of '_io._IOBase' objects}
  2557709    0.257    0.000    0.257    0.000 {built-in method builtins.min}
  1160074    0.245    0.000    1.834    0.000 ida_ua.py:1851(print_insn_mnem)
   317927    0.244    0.000    1.117    0.000 diaphora_ida.py:2446(is_constant)
  3922471    0.241    0.000    0.241    0.000 {method 'add' of 'set' objects}
  1160074    0.239    0.000    8.275    0.000 ida_lines.py:759(generate_disasm_line)
     9970    0.237    0.000    0.980    0.000 tarjan_sort.py:75(robust_topological_sort)
  1160074    0.236    0.000    1.356    0.000 ida_nalt.py:2589(get_switch_info)
  1160074    0.233    0.000    5.522    0.000 ida_idp.py:487(is_call_insn)
   720366    0.232    0.000    0.680    0.000 ida_hexrays.py:6174(__init__)
  2320148    0.227    0.000    0.227    0.000 {built-in method _ida_bytes.get_item_size}
    94211    0.221    0.000    0.221    0.000 {built-in method _ida_name.demangle_name}
   720366    0.220    0.000   27.212    0.000 ida_hexrays.py:13931(_print)
   862825    0.214    0.000    0.367    0.000 ida_ua.py:315(__get_addr__)
   945536    0.210    0.000    0.210    0.000 {built-in method _ida_pro.intvec_t___getitem__}
   833138    0.203    0.000    0.352    0.000 ida_ua.py:287(__get_value__)
     9970    0.198    0.000    0.242    0.000 tarjan_sort.py:52(topological_sort)
  1004824    0.197    0.000    0.605    0.000 ida_funcs.py:737(get_func)
   720366    0.195    0.000    0.715    0.000 ida_hexrays.py:6183(get_s)
   945536    0.194    0.000    0.404    0.000 ida_pro.py:836(__getitem__)
   720366    0.191    0.000    0.906    0.000 ida_hexrays.py:6189(<lambda>)
   898934    0.190    0.000    0.481    0.000 ida_pro.py:2019(__getitem__)
   720366    0.190    0.000    0.190    0.000 {built-in method _ida_pro.intvec_t_size}
   721995    0.189    0.000    2.561    0.000 ida_gdl.py:855(<genexpr>)
   327919    0.175    0.000    0.286    0.000 diaphora_ida.py:2552(<genexpr>)
   327919    0.167    0.000    1.201    0.000 diaphora_ida.py:2556(<genexpr>)
     9968    0.158    0.000    0.158    0.000 {built-in method _ida_hexrays.cfuncptr_t_get_pseudocode}
   862825    0.153    0.000    0.153    0.000 {built-in method _ida_ua.op_t___get_addr__}
   720366    0.150    0.000    0.340    0.000 ida_pro.py:651(size)
   833138    0.148    0.000    0.148    0.000 {built-in method _ida_ua.op_t___get_value__}
    29910    0.147    0.000    2.437    0.000 tarjan_sort.py:14(strongly_connected_components)
  1160074    0.146    0.000    0.146    0.000 {built-in method _ida_nalt.new_switch_info_t}
   240665    0.143    0.000    0.721    0.000 diaphora.py:544(db_cursor)
     9970    0.133    0.000    1.993    0.000 diaphora_ida.py:2623(extract_function_topological_information)
   481979    0.129    0.000    0.213    0.000 ida_ua.py:273(__get_reg_phrase__)
   683745    0.127    0.000    0.252    0.000 ida_xref.py:609(get_next_fcref_from)
   683745    0.125    0.000    0.125    0.000 {built-in method _ida_xref.get_next_fcref_from}
   230695    0.109    0.000    0.109    0.000 graph_hashes.py:85(get_edges_value)
   461390    0.107    0.000    0.200    0.000 ida_bytes.py:642(get_flags)
   720366    0.103    0.000    0.103    0.000 {built-in method _ida_hexrays.new_qstring_printer_t}
   240665    0.097    0.000    0.097    0.000 {method 'cursor' of 'sqlite3.Connection' objects}
   461390    0.092    0.000    0.092    0.000 {built-in method _ida_bytes.get_flags}
     9970    0.091    0.000    0.091    0.000 diaphora.py:888(create_function_dictionary)
   461390    0.091    0.000    0.091    0.000 idc.py:159(is_head)
   240667    0.086    0.000    0.105    0.000 threading.py:1338(current_thread)
   481979    0.084    0.000    0.084    0.000 {built-in method _ida_ua.op_t___get_reg_phrase__}
   598218    0.076    0.000    0.076    0.000 {built-in method builtins.getattr}
     9999    0.075    0.000    0.075    0.000 {built-in method _ida_funcs.get_func_name}
   343688    0.073    0.000    0.324    0.000 idautils.py:38(CodeRefsTo)
   230695    0.069    0.000    0.069    0.000 graph_hashes.py:73(get_node_value)
   240666    0.069    0.000    0.069    0.000 threading.py:1089(ident)
   389453    0.067    0.000    0.124    0.000 ida_xref.py:512(get_next_dref_from)
     9970    0.066    0.000    3.432    0.000 diaphora.py:1150(get_cmp_pseudo_lines)
     9970    0.063    0.000 2273.802    0.228 diaphora.py:1005(save_function_to_database)
     9968    0.062    0.000    2.604    0.000 kfuzzy.py:246(hash_bytes)
     9968    0.061    0.000    0.063    0.000 kfuzzy.py:218(mix_blocks)
     9970    0.059    0.000  158.964    0.016 diaphora_ida.py:2364(guess_type)
   219833    0.058    0.000    1.618    0.000 kfuzzy.py:31(modsum)
   389453    0.057    0.000    0.057    0.000 {built-in method _ida_xref.get_next_dref_from}
   172122    0.054    0.000    0.112    0.000 ida_xref.py:374(crefs_to)
   282424    0.053    0.000    0.527    0.000 ida_bytes.py:4338(get_strlit_contents)
     9968    0.052    0.000    1.451    0.000 diaphora_ida.py:3829(__init__)
    19940    0.047    0.000    0.047    0.000 {built-in method _hashlib.openssl_md5}
   171566    0.046    0.000    0.103    0.000 ida_xref.py:384(fcrefs_to)
    96919    0.045    0.000    0.945    0.000 idc.py:4971(get_type)
    49850    0.042    0.000    0.166    0.000 diaphora_ida.py:2551(<genexpr>)
     9999    0.040    0.000    0.040    0.000 diaphora_ida.py:1049(clear_pseudo_fields)
    19969    0.038    0.000   17.898    0.001 ida_gdl.py:821(__init__)
    19940    0.037    0.000    0.037    0.000 {built-in method _ida_nalt.get_imagebase}
    19940    0.036    0.000    0.036    0.000 {method 'join' of 'bytes' objects}
    96919    0.035    0.000    0.900    0.000 ida_typeinf.py:9983(idc_get_type)
    94211    0.033    0.000    0.254    0.000 ida_name.py:1141(demangle_name)
    29910    0.032    0.000    0.068    0.000 ida_gdl.py:854(__iter__)
    41224    0.031    0.000    0.046    0.000 idautils.py:199(Functions)
    19969    0.031    0.000    0.048    0.000 os.py:674(__getitem__)
     9968    0.030    0.000    0.030    0.000 {built-in method _ida_hexrays.new_ctree_visitor_t}
    19969    0.028    0.000    0.028    0.000 {built-in method _ida_gdl.qflow_chart_t_swiginit}
   161596    0.028    0.000    0.028    0.000 {built-in method _ida_xref.get_next_fcref_to}
     9999    0.027    0.000    0.216    0.000 diaphora_ida.py:2463(get_function_names)
   162152    0.027    0.000    0.027    0.000 {built-in method _ida_xref.get_next_cref_to}
    29939    0.027    0.000    0.097    0.000 idc.py:2864(get_func_attr)
     9970    0.026    0.000    0.026    0.000 {built-in method _ida_funcs.get_func_cmt}
     9970    0.023    0.000    0.047    0.000 idc.py:2208(get_segm_start)
    19969    0.022    0.000   17.860    0.001 ida_gdl.py:626(__init__)
    29939    0.022    0.000    0.050    0.000 idc.py:89(_IDC_GetAttr)
   161596    0.021    0.000    0.049    0.000 ida_xref.py:626(get_next_fcref_to)
   162152    0.021    0.000    0.049    0.000 ida_xref.py:588(get_next_cref_to)
    29940    0.021    0.000    0.021    0.000 {built-in method builtins.hasattr}
    19938    0.020    0.000    0.040    0.000 ida_hexrays.py:16856(__init__)
    19938    0.020    0.000    0.020    0.000 {built-in method _ida_hexrays.new_mba_ranges_t}
    19940    0.020    0.000    0.069    0.000 diaphora_ida.py:3094(get_base_address)
    19936    0.020    0.000   38.835    0.002 ida_hexrays.py:15680(_print)
    19938    0.019    0.000    0.069    0.000 diaphora_ida.py:1018(__init__)
    19938    0.019    0.000    0.049    0.000 ida_hexrays.py:6084(__init__)
     9970    0.019    0.000    1.533    0.000 idautils.py:558(GetInstructionList)
     9968    0.019    0.000    0.061    0.000 ida_hexrays.py:17980(__init__)
   240667    0.019    0.000    0.019    0.000 {built-in method _thread.get_ident}
   218975    0.018    0.000    0.018    0.000 {built-in method builtins.chr}
    19969    0.017    0.000    0.082    0.000 os.py:771(getenv)
    19938    0.017    0.000  157.632    0.008 ida_hexrays.py:22282(gen_microcode)
     9970    0.017    0.000    0.468    0.000 ida_hexrays.py:22262(decompile_func)
     9970    0.017    0.000    0.017    0.000 tarjan_sort.py:60(<listcomp>)
    29904    0.016    0.000    0.028    0.000 base64.py:51(b64encode)
    19938    0.016    0.000    0.053    0.000 ida_hexrays.py:14923(__init__)
    19938    0.016    0.000    0.016    0.000 {built-in method _ida_hexrays.mba_ranges_t_swiginit}
    19969    0.016    0.000    0.064    0.000 _collections_abc.py:761(get)
    19938    0.016    0.000    0.016    0.000 {built-in method _ida_hexrays.new_vd_printer_t}
     9970    0.016    0.000    0.016    0.000 {built-in method _ida_segment.getseg}
   101392    0.015    0.000    0.015    0.000 {method 'decode' of 'bytes' objects}
    19940    0.015    0.000    0.015    0.000 {method 'hexdigest' of '_hashlib.HASH' objects}
    19938    0.014    0.000    0.034    0.000 ida_hexrays.py:8712(__init__)
    19938    0.014    0.000    0.014    0.000 {built-in method _ida_hexrays.vd_printer_t_swiginit}
    10000    0.014    0.000  394.674    0.039 ida_kernwin.py:7289(user_cancelled)
     9970    0.014    0.000    0.059    0.000 idc.py:3025(get_func_cmt)
    19938    0.013    0.000    0.013    0.000 {built-in method _ida_hexrays.hexrays_failure_t_swiginit}
    19938    0.013    0.000    0.013    0.000 {built-in method _ida_hexrays.mlist_t_swiginit}
   190481    0.013    0.000    0.013    0.000 {method 'pop' of 'list' objects}
     9970    0.013    0.000    0.013    0.000 {built-in method _ida_hexrays.cfuncptr_t___deref__}
    19940    0.012    0.000    0.049    0.000 ida_nalt.py:3420(get_imagebase)
     9968    0.012    0.000    0.012    0.000 {built-in method _ida_hexrays.ctree_visitor_t_swiginit}
     9968    0.012    0.000    0.012    0.000 {built-in method _ida_pro.strvec_t_size}
    19936    0.012    0.000    0.573    0.000 ida_hexrays.py:15612(build_graph)
    20016    0.011    0.000    0.041    0.000 idautils.py:81(DataRefsTo)
    29904    0.011    0.000    0.011    0.000 {built-in method binascii.b2a_base64}
   240665    0.011    0.000    0.011    0.000 {method 'close' of 'sqlite3.Cursor' objects}
     9970    0.011    0.000    0.715    0.000 idc.py:5007(guess_type)
     9970    0.010    0.000    0.010    0.000 {built-in method _ida_hexrays.init_hexrays_plugin}
    19969    0.010    0.000    0.018    0.000 os.py:754(encode)
    49873    0.010    0.000    0.010    0.000 {method 'encode' of 'str' objects}
     9968    0.010    0.000    5.246    0.001 ida_hexrays.py:17993(apply_to)
     9970    0.010    0.000    0.010    0.000 {built-in method _ida_segment.is_spec_ea}
    41223    0.010    0.000    0.010    0.000 {built-in method _ida_funcs.get_next_func}
     9999    0.009    0.000    0.084    0.000 ida_funcs.py:1026(get_func_name)
    20016    0.009    0.000    0.021    0.000 ida_xref.py:414(drefs_to)
     9968    0.008    0.000    0.020    0.000 ida_pro.py:1881(size)
     9970    0.008    0.000    0.019    0.000 ida_hexrays.py:4116(init_hexrays_plugin)
     9970    0.008    0.000    0.765    0.000 ida_idp.py:5846(ph_get_instruc)
     9970    0.008    0.000    0.018    0.000 ida_segment.py:678(is_spec_ea)
     9970    0.008    0.000    0.023    0.000 ida_segment.py:1015(getseg)
     9970    0.007    0.000    0.704    0.000 ida_typeinf.py:9975(idc_guess_type)
     9970    0.007    0.000    0.020    0.000 ida_hexrays.py:2045(__deref__)
     9970    0.007    0.000    0.481    0.000 diaphora_ida.py:2200(do_decompile)
    19938    0.007    0.000    0.007    0.000 {built-in method _ida_hexrays.new_mlist_t}
     9968    0.007    0.000    0.165    0.000 ida_hexrays.py:2273(get_pseudocode)
    19938    0.007    0.000    0.007    0.000 {built-in method _ida_hexrays.new_hexrays_failure_t}
     9970    0.006    0.000    0.047    0.000 {method 'extend' of 'list' objects}
     9970    0.006    0.000    0.474    0.000 ida_hexrays.py:26070(decompile)
     9968    0.006    0.000    0.579    0.000 ida_hexrays.py:21738(restore_user_cmts)
     9970    0.006    0.000    0.006    0.000 {built-in method _ida_xref.get_first_cref_to}
     9970    0.006    0.000    0.032    0.000 ida_funcs.py:841(get_func_cmt)
       25    0.005    0.000    0.005    0.000 {built-in method _ida_kernwin.replace_wait_box}
    41223    0.005    0.000    0.014    0.000 ida_funcs.py:820(get_next_func)
    29904    0.004    0.000    0.004    0.000 {method 'strip' of 'bytes' objects}
     9970    0.004    0.000    0.004    0.000 {built-in method _ida_xref.get_first_dref_to}
     9970    0.004    0.000    0.010    0.000 ida_xref.py:575(get_first_cref_to)
     9970    0.004    0.000    0.008    0.000 ida_xref.py:618(get_first_fcref_to)
     9970    0.004    0.000    0.008    0.000 ida_xref.py:525(get_first_dref_to)
     9970    0.004    0.000    0.004    0.000 {built-in method _ida_xref.get_first_fcref_to}
      278    0.002    0.000    0.002    0.000 {built-in method _ida_xref.calc_switch_cases}
    10046    0.002    0.000    0.004    0.000 ida_xref.py:535(get_next_dref_to)
    10046    0.002    0.000    0.002    0.000 {built-in method _ida_xref.get_next_dref_to}
     9970    0.002    0.000    0.002    0.000 {method 'insert' of 'list' objects}
     9970    0.002    0.000    0.002    0.000 graph_hashes.py:70(__init__)
     7925    0.002    0.000    0.002    0.000 {built-in method _ida_pro.int64vec_t___getitem__}
     7925    0.002    0.000    0.003    0.000 ida_pro.py:1326(__getitem__)
     9970    0.001    0.000    0.001    0.000 {method 'keys' of 'dict' objects}
     9970    0.001    0.000    0.001    0.000 {method 'remove' of 'list' objects}
     3004    0.001    0.000    0.001    0.000 {built-in method _ida_xref.casevec_t___getitem__}
     3004    0.001    0.000    0.001    0.000 {built-in method _ida_pro.int64vec_t_size}
     9970    0.001    0.000    0.001    0.000 {method 'items' of 'dict' objects}
     3004    0.001    0.000    0.002    0.000 ida_pro.py:1141(size)
     3004    0.001    0.000    0.002    0.000 ida_xref.py:880(__getitem__)
      278    0.000    0.000    0.000    0.000 ida_nalt.py:2359(get_jtable_size)
      278    0.000    0.000    0.000    0.000 ida_xref.py:687(size)
      278    0.000    0.000    0.003    0.000 ida_xref.py:114(calc_switch_cases)
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
      278    0.000    0.000    0.000    0.000 {built-in method _ida_xref.casevec_t_size}
      278    0.000    0.000    0.000    0.000 {built-in method _ida_nalt.switch_info_t_get_jtable_size}
       29    0.000    0.000    0.000    0.000 diaphora_ida.py:145(debug_refresh)
        1    0.000    0.000    0.000    0.000 {built-in method _ida_kernwin.show_wait_box}
        1    0.000    0.000    0.000    0.000 {built-in method posix.stat}
        3    0.000    0.000    0.000    0.000 {method 'execute' of 'sqlite3.Connection' objects}
       25    0.000    0.000    0.005    0.000 ida_kernwin.py:7799(replace_wait_box)
      100    0.000    0.000    0.000    0.000 {built-in method builtins.divmod}
        1    0.000    0.000 3337.684 3337.684 diaphora_ida.py:1230(export)
       32    0.000    0.000    0.000    0.000 types.py:171(__get__)
        4    0.000    0.000    0.000    0.000 {built-in method _ida_kernwin.msg}
        1    0.000    0.000    0.000    0.000 diaphora.py:449(__del__)
        1    0.000    0.000    0.000    0.000 cProfile.py:40(print_stats)
       26    0.000    0.000    0.000    0.000 {built-in method time.monotonic}
        1    0.000    0.000    0.000    0.000 diaphora.py:553(db_close)
       29    0.000    0.000    0.000    0.000 {built-in method builtins.repr}
        1    0.000    0.000    0.000    0.000 {built-in method _ida_funcs.get_fchunk}
        1    0.000    0.000    0.000    0.000 {method 'close' of '_io.BufferedWriter' objects}
       32    0.000    0.000    0.000    0.000 re.py:250(compile)
        1    0.000    0.000    0.000    0.000 genericpath.py:16(exists)
        1    0.000    0.000    0.000    0.000 pstats.py:117(init)
        1    0.000    0.000    0.000    0.000 pstats.py:107(__init__)
       32    0.000    0.000    0.000    0.000 enum.py:792(value)
        1    0.000    0.000    0.000    0.000 {built-in method _ida_funcs.get_next_fchunk}
        2    0.000    0.000    0.000    0.000 {built-in method time.asctime}
        1    0.000    0.000    0.000    0.000 ida_kernwin.py:926(show_wait_box)
        1    0.000    0.000    0.000    0.000 cProfile.py:50(create_stats)
        2    0.000    0.000    0.000    0.000 diaphora_ida.py:120(log)
        1    0.000    0.000    0.000    0.000 {method 'close' of 'sqlite3.Connection' objects}
        1    0.000    0.000    0.000    0.000 pstats.py:136(load_stats)
        4    0.000    0.000    0.000    0.000 init.py:76(write)
        1    0.000    0.000    0.000    0.000 ida_funcs.py:1165(get_fchunk)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        4    0.000    0.000    0.000    0.000 ida_kernwin.py:197(msg)
        1    0.000    0.000    0.000    0.000 diaphora.py:468(load_hooks)
        1    0.000    0.000    0.000    0.000 ida_funcs.py:1216(get_next_fchunk)
        1    0.000    0.000    0.000    0.000 {method 'commit' of 'sqlite3.Connection' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method '__exit__' of '_io._IOBase' objects}
turbocool3r commented 9 months ago

Another one that took ~17 hours. The cause seems to be the same:

Exception: Canceled.
[Diaphora: Wed Oct  4 13:48:41 2023] Removing crash file /Users/admin/Documents/Apple/CVE-2023-41993/com.apple.dyld.17.0/JavaScriptCore.sqlite-crash...
         1105239852 function calls (1101825819 primitive calls) in 3406.712 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 12631926 2273.867    0.000 2273.867    0.000 {method 'execute' of 'sqlite3.Cursor' objects}
    10000  401.235    0.040  401.235    0.040 {built-in method _ida_kernwin.user_cancelled}
    19938  169.013    0.008  169.013    0.008 {built-in method _ida_hexrays.gen_microcode}
146060012   41.382    0.000   41.382    0.000 {method 'sub' of 're.Pattern' objects}
    19936   38.423    0.002   40.394    0.002 {built-in method _ida_hexrays.mba_t__print}
146060012   38.149    0.000   79.531    0.000 diaphora.py:1128(re_sub)
    19969   31.094    0.002   31.094    0.002 {built-in method _ida_gdl.new_qflow_chart_t}
  5208597   30.906    0.000  107.834    0.000 diaphora.py:1169(get_cmp_asm)
     9999   30.154    0.003  697.154    0.070 diaphora_ida.py:2716(read_function)
   720366   27.336    0.000   27.336    0.000 {built-in method _ida_hexrays.mblock_t__print}
    19936   17.089    0.001   72.140    0.004 diaphora_ida.py:2222(get_microcode_bblocks)
    19938   13.673    0.001  319.956    0.016 diaphora_ida.py:2276(get_microcode)
  8619624   10.759    0.000   18.538    0.000 diaphora_ida.py:2204(get_plain_microcode_line)
  1160074    9.899    0.000    9.899    0.000 {built-in method _ida_idp.is_call_insn}
  1160074    9.337    0.000   38.310    0.000 diaphora_ida.py:2487(extract_function_constants)
  9937042    9.314    0.000    9.314    0.000 {method 'split' of 're.Pattern' objects}
     9970    8.786    0.001   92.077    0.009 diaphora_ida.py:2673(extract_microcode)
 20615674    8.676    0.000    8.676    0.000 {built-in method _ida_lines.tag_remove}
  1160074    8.100    0.000    8.100    0.000 {built-in method _ida_lines.generate_disasm_line}
 12107226    7.340    0.000   22.610    0.000 idautils.py:98(DataRefsFrom)
   230695    7.201    0.000    7.201    0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
 22959803    6.583    0.000    6.583    0.000 {built-in method _ida_ua.operands_array___getitem__}
  5630667    6.415    0.000    6.415    0.000 {built-in method _ida_gdl.qflow_chart_t_calc_block_type}
 23470347    5.945    0.000   17.985    0.000 ida_idaapi.py:312(_bounded_getitem_iterator)
  5630667    5.862    0.000   13.520    0.000 ida_gdl.py:782(__init__)
     9970    5.844    0.001   67.283    0.007 graph_hashes.py:100(calculate)
 15203610    5.709    0.000    5.709    0.000 {built-in method _ida_xref.xrefblk_t_swiginit}
     9958    5.055    0.001   16.478    0.002 diaphora.py:762(save_microcode_instructions)
 22959803    4.869    0.000   11.451    0.000 ida_ua.py:120(__getitem__)
     9970    4.685    0.000    4.685    0.000 {built-in method _ida_hexrays.decompile_func}
  4837178    4.248    0.000    4.248    0.000 {method 'index' of 'list' objects}
 15203610    3.984    0.000   10.891    0.000 ida_xref.py:474(__init__)
 56330391    3.889    0.000    3.889    0.000 {method 'strip' of 'str' objects}
 30542332    3.723    0.000    3.723    0.000 {method 'split' of 'str' objects}
     9970    3.676    0.000   11.185    0.001 diaphora.py:676(save_instructions_to_database)
 12107226    3.339    0.000    7.026    0.000 ida_xref.py:424(drefs_from)
 33947625    3.214    0.000    3.214    0.000 {method 'find' of 'str' objects}
  3728204    3.058    0.000    3.058    0.000 {built-in method _ida_ua.decode_insn}
 20615674    2.899    0.000   11.575    0.000 ida_lines.py:829(tag_remove)
  1408056    2.797    0.000    7.437    0.000 idc.py:1622(get_operand_value)
  9937074    2.788    0.000    3.638    0.000 re.py:289(_compile)
     9970    2.776    0.000    3.090    0.000 diaphora_ida.py:2475(extract_function_callers)
  4661241    2.760    0.000    2.952    0.000 diaphora.py:665(get_valid_prop)
  5630667    2.709    0.000   19.212    0.000 ida_gdl.py:850(_getitem)
  9937042    2.666    0.000   15.618    0.000 re.py:223(split)
  1160074    2.614    0.000    9.255    0.000 diaphora_ida.py:2612(get_decoded_instruction)
 66755506    2.541    0.000    2.541    0.000 {method 'append' of 'list' objects}
  2593591    2.439    0.000    3.315    0.000 diaphora_ida.py:3836(visit_expr)
  4163967    2.365    0.000    7.915    0.000 idautils.py:60(CodeRefsFrom)
  4577097    2.272    0.000   16.544    0.000 ida_gdl.py:807(succs)
     9999    2.265    0.000    5.630    0.001 diaphora_ida.py:2703(get_microcode_instructions)
    13003    2.229    0.000    2.229    0.000 {built-in method builtins.dir}
  6718767    2.103    0.000    2.103    0.000 {built-in method _ida_ua.insn_t___get_ops__}
    19938    2.024    0.000  110.774    0.006 diaphora.py:1138(get_cmp_asm_lines)
  4938582    1.960    0.000   21.818    0.000 ida_gdl.py:858(__getitem__)
3217354/689549    1.930    0.000    2.289    0.000 tarjan_sort.py:26(visit)
1707826/1389877    1.893    0.000    2.991    0.000 {built-in method builtins.sum}
  3835108    1.885    0.000   13.115    0.000 ida_gdl.py:798(preds)
 11693478    1.806    0.000    3.563    0.000 ida_xref.py:500(get_first_dref_from)
  3728204    1.763    0.000    1.763    0.000 {built-in method _ida_ua.insn_t_swiginit}
 11693478    1.757    0.000    1.757    0.000 {built-in method _ida_xref.get_first_dref_from}
 27488370    1.703    0.000    1.703    0.000 {built-in method builtins.isinstance}
  9937042    1.679    0.000    1.972    0.000 diaphora_ida.py:1027(_print)
  5630667    1.650    0.000    1.650    0.000 {built-in method _ida_gdl.qflow_chart_t___getitem__}
  6718767    1.604    0.000    3.707    0.000 ida_ua.py:816(__get_ops__)
  1160074    1.591    0.000    1.591    0.000 {built-in method _ida_ua.print_insn_mnem}
     9970    1.533    0.000 2269.750    0.228 diaphora.py:717(insert_basic_blocks_to_database)
     9970    1.531    0.000  168.605    0.017 diaphora_ida.py:2306(decompile_and_get)
  1219894    1.523    0.000    1.523    0.000 encoder.py:204(iterencode)
 22365236    1.469    0.000    2.858    0.000 {built-in method builtins.len}
  9280592    1.430    0.000    2.734    0.000 ida_bytes.py:2287(is_forced_operand)
  5630667    1.333    0.000    2.983    0.000 ida_gdl.py:760(__getitem__)
  9280592    1.304    0.000    1.304    0.000 {built-in method _ida_bytes.is_forced_operand}
  2781538    1.299    0.000    2.960    0.000 idautils.py:179(Heads)
  4163967    1.272    0.000    2.983    0.000 ida_xref.py:404(fcrefs_from)
  5630667    1.244    0.000    7.658    0.000 ida_gdl.py:673(calc_block_type)
        1    1.234    1.234 3406.712 3406.712 diaphora_ida.py:1132(do_export)
 18895546    1.205    0.000    1.205    0.000 {method 'startswith' of 'str' objects}
 15203610    1.198    0.000    1.198    0.000 {built-in method _ida_xref.new_xrefblk_t}
  1219894    1.195    0.000    4.198    0.000 __init__.py:183(dumps)
  4968492    1.189    0.000    3.075    0.000 ida_gdl.py:837(<lambda>)
  3728204    1.176    0.000    3.570    0.000 ida_ua.py:650(__init__)
     9968    1.166    0.000    5.285    0.001 {built-in method _ida_hexrays.ctree_visitor_t_apply_to}
  4968492    1.138    0.000    1.886    0.000 ida_gdl.py:748(size)
  1160074    1.130    0.000    1.130    0.000 {built-in method _ida_nalt.get_switch_info}
     9968    1.026    0.000    1.026    0.000 factor.py:28(<listcomp>)
  2320148    0.993    0.000    0.993    0.000 {built-in method _ida_bytes.get_bytes}
  1160074    0.912    0.000   13.294    0.000 graph_hashes.py:95(is_call_insn)
    39909    0.900    0.000    0.900    0.000 {method 'sort' of 'list' objects}
    96919    0.887    0.000    0.887    0.000 {built-in method _ida_typeinf.idc_get_type}
   357829    0.879    0.000    0.879    0.000 {method 'sqrt' of 'decimal.Decimal' objects}
  3480222    0.868    0.000    0.868    0.000 {built-in method _ida_xref.get_first_fcref_from}
  1160074    0.853    0.000    3.365    0.000 ida_nalt.py:3773(get_switch_info)
     9970    0.830    0.000    0.830    0.000 {built-in method _ida_idp.ph_get_instruc}
  1160074    0.828    0.000    3.344    0.000 diaphora_ida.py:223(diaphora_decode)
  1219894    0.773    0.000    2.446    0.000 encoder.py:182(encode)
     9970    0.768    0.000    3.410    0.000 diaphora_ida.py:2560(extract_function_pseudocode_features)
     9970    0.756    0.000    0.756    0.000 {built-in method _ida_typeinf.idc_guess_type}
  3728204    0.754    0.000    3.811    0.000 ida_ua.py:1927(decode_insn)
     9970    0.753    0.000    0.753    0.000 idautils.py:560(<listcomp>)
  4968492    0.748    0.000    0.748    0.000 {built-in method _ida_gdl.qflow_chart_t_size}
    29904    0.747    0.000    2.486    0.000 kfuzzy.py:104(_hash)
 17964664    0.745    0.000    0.745    0.000 {method 'isdigit' of 'str' objects}
  1160074    0.741    0.000    9.876    0.000 idc.py:1486(generate_disasm_line)
   720366    0.726    0.000    0.726    0.000 {method 'splitlines' of 'str' objects}
  1460668    0.711    0.000    0.711    0.000 {built-in method _ida_hexrays.mba_t_get_mblock}
  1160074    0.675    0.000   12.690    0.000 diaphora_ida.py:2662(extract_line_mnem_disasm)
  2724938    0.664    0.000    1.180    0.000 ida_gdl.py:722(succ)
     9968    0.642    0.000    0.642    0.000 {built-in method _ida_hexrays.restore_user_cmts}
  3728204    0.632    0.000    0.632    0.000 {built-in method _ida_ua.new_insn_t}
    19936    0.594    0.000    0.594    0.000 {built-in method _ida_hexrays.mba_t_build_graph}
   634675    0.593    0.000    0.593    0.000 diaphora_ida.py:2422(constant_filter)
  3480222    0.589    0.000    1.457    0.000 ida_xref.py:601(get_first_fcref_from)
   595799    0.583    0.000    0.803    0.000 diaphora_ida.py:3843(visit_insn)
   230695    0.569    0.000 2265.285    0.010 diaphora.py:647(get_bb_id)
  1209924    0.557    0.000    0.557    0.000 encoder.py:104(__init__)
  2320148    0.554    0.000    1.020    0.000 ida_ua.py:114(__len__)
  3189390    0.553    0.000    1.096    0.000 ida_hexrays.py:18631(_get_op)
  3189390    0.544    0.000    0.544    0.000 {built-in method _ida_hexrays.citem_t__get_op}
  2320148    0.538    0.000    0.538    0.000 {built-in method _ida_bytes.next_head}
  1160074    0.531    0.000    3.932    0.000 diaphora_ida.py:2506(extract_function_switches)
  1160074    0.530    0.000    0.530    0.000 {built-in method _ida_nalt.switch_info_t_swiginit}
1136558/568279    0.526    0.000    0.596    0.000 ida_ida.py:4257(__getattribute__)
  2213644    0.517    0.000    0.932    0.000 ida_gdl.py:731(pred)
  2724938    0.517    0.000    0.517    0.000 {built-in method _ida_gdl.qflow_chart_t_succ}
   720366    0.506    0.000    0.506    0.000 {built-in method _ida_hexrays.qstring_printer_t_get_s}
  1852159    0.489    0.000    0.872    0.000 ida_gdl.py:706(nsucc)
  2320148    0.487    0.000    1.479    0.000 ida_bytes.py:4312(get_bytes)
  2320148    0.479    0.000    0.479    0.000 {built-in method _ida_bytes.get_cmt}
   282424    0.476    0.000    0.476    0.000 {built-in method _ida_bytes.get_strlit_contents}
  2320148    0.469    0.000    0.948    0.000 ida_bytes.py:3780(get_cmt)
  2320148    0.465    0.000    0.465    0.000 {built-in method _ida_ua.operands_array___len__}
  2320148    0.463    0.000    0.692    0.000 ida_bytes.py:602(get_item_size)
     9970    0.447    0.000 2301.350    0.231 diaphora.py:1025(save_function)
  1004824    0.419    0.000    0.419    0.000 {built-in method _ida_funcs.get_func}
  2213644    0.414    0.000    0.414    0.000 {built-in method _ida_gdl.qflow_chart_t_pred}
  1621464    0.394    0.000    0.701    0.000 ida_gdl.py:714(npred)
  1160074    0.393    0.000    1.071    0.000 ida_nalt.py:2421(__init__)
  1852159    0.383    0.000    0.383    0.000 {built-in method _ida_gdl.qflow_chart_t_nsucc}
  1460668    0.366    0.000    1.077    0.000 ida_hexrays.py:15716(get_mblock)
  2320148    0.363    0.000    0.901    0.000 ida_bytes.py:496(next_head)
     9970    0.361    0.000    0.474    0.000 diaphora_ida.py:2589(extract_function_assembly_features)
   240665    0.356    0.000    0.564    0.000 diaphora.py:533(get_db)
   720366    0.350    0.000    0.350    0.000 {built-in method _ida_hexrays.qstring_printer_t_swiginit}
     9968    0.324    0.000    1.350    0.000 factor.py:16(primesbelow)
     9970    0.311    0.000    1.756    0.000 diaphora_ida.py:2531(extract_function_mdindex)
  1621464    0.307    0.000    0.307    0.000 {built-in method _ida_gdl.qflow_chart_t_npred}
  1160074    0.305    0.000   10.180    0.000 idc.py:1514(GetDisasm)
  1339514    0.296    0.000    0.296    0.000 {method 'join' of 'str' objects}
   898934    0.293    0.000    0.293    0.000 {built-in method _ida_pro.strvec_t___getitem__}
    19938    0.275    0.000    0.275    0.000 {method 'readlines' of '_io._IOBase' objects}
  2557709    0.250    0.000    0.250    0.000 {built-in method builtins.min}
  3922471    0.244    0.000    0.244    0.000 {method 'add' of 'set' objects}
  1160074    0.244    0.000    1.834    0.000 ida_ua.py:1851(print_insn_mnem)
   317927    0.242    0.000    1.134    0.000 diaphora_ida.py:2446(is_constant)
  1160074    0.241    0.000    8.341    0.000 ida_lines.py:759(generate_disasm_line)
  1160074    0.236    0.000    1.366    0.000 ida_nalt.py:2589(get_switch_info)
  1160074    0.236    0.000   10.134    0.000 ida_idp.py:487(is_call_insn)
     9970    0.233    0.000    0.978    0.000 tarjan_sort.py:75(robust_topological_sort)
   720366    0.231    0.000    0.687    0.000 ida_hexrays.py:6174(__init__)
    94211    0.231    0.000    0.231    0.000 {built-in method _ida_name.demangle_name}
  2320148    0.229    0.000    0.229    0.000 {built-in method _ida_bytes.get_item_size}
   720366    0.221    0.000   27.557    0.000 ida_hexrays.py:13931(_print)
   862825    0.217    0.000    0.368    0.000 ida_ua.py:315(__get_addr__)
   945536    0.213    0.000    0.213    0.000 {built-in method _ida_pro.intvec_t___getitem__}
   833138    0.205    0.000    0.353    0.000 ida_ua.py:287(__get_value__)
  1004824    0.201    0.000    0.620    0.000 ida_funcs.py:737(get_func)
     9970    0.199    0.000    0.243    0.000 tarjan_sort.py:52(topological_sort)
   720366    0.196    0.000    0.702    0.000 ida_hexrays.py:6183(get_s)
   945536    0.195    0.000    0.408    0.000 ida_pro.py:836(__getitem__)
   720366    0.195    0.000    0.195    0.000 {built-in method _ida_pro.intvec_t_size}
   720366    0.193    0.000    0.895    0.000 ida_hexrays.py:6189(<lambda>)
   898934    0.191    0.000    0.484    0.000 ida_pro.py:2019(__getitem__)
   721995    0.191    0.000    2.582    0.000 ida_gdl.py:855(<genexpr>)
   327919    0.177    0.000    0.288    0.000 diaphora_ida.py:2552(<genexpr>)
   240665    0.174    0.000    0.841    0.000 diaphora.py:544(db_cursor)
   327919    0.170    0.000    1.210    0.000 diaphora_ida.py:2556(<genexpr>)
   862825    0.152    0.000    0.152    0.000 {built-in method _ida_ua.op_t___get_addr__}
   720366    0.151    0.000    0.345    0.000 ida_pro.py:651(size)
    29910    0.150    0.000    2.439    0.000 tarjan_sort.py:14(strongly_connected_components)
  1160074    0.149    0.000    0.149    0.000 {built-in method _ida_nalt.new_switch_info_t}
   833138    0.148    0.000    0.148    0.000 {built-in method _ida_ua.op_t___get_value__}
     9970    0.134    0.000    1.977    0.000 diaphora_ida.py:2623(extract_function_topological_information)
   683745    0.128    0.000    0.254    0.000 ida_xref.py:609(get_next_fcref_from)
   481979    0.128    0.000    0.213    0.000 ida_ua.py:273(__get_reg_phrase__)
   683745    0.126    0.000    0.126    0.000 {built-in method _ida_xref.get_next_fcref_from}
   230695    0.110    0.000    0.110    0.000 graph_hashes.py:85(get_edges_value)
   461390    0.108    0.000    0.203    0.000 ida_bytes.py:642(get_flags)
   240667    0.106    0.000    0.127    0.000 threading.py:1338(current_thread)
   720366    0.105    0.000    0.105    0.000 {built-in method _ida_hexrays.new_qstring_printer_t}
   240665    0.103    0.000    0.103    0.000 {method 'cursor' of 'sqlite3.Connection' objects}
   461390    0.095    0.000    0.095    0.000 {built-in method _ida_bytes.get_flags}
     9970    0.094    0.000    0.094    0.000 diaphora.py:888(create_function_dictionary)
   461390    0.092    0.000    0.092    0.000 idc.py:159(is_head)
     9999    0.087    0.000    0.087    0.000 {built-in method _ida_funcs.get_func_name}
   481979    0.085    0.000    0.085    0.000 {built-in method _ida_ua.op_t___get_reg_phrase__}
   240666    0.080    0.000    0.080    0.000 threading.py:1089(ident)
   598218    0.078    0.000    0.078    0.000 {built-in method builtins.getattr}
   343688    0.076    0.000    0.323    0.000 idautils.py:38(CodeRefsTo)
   230695    0.070    0.000    0.070    0.000 graph_hashes.py:73(get_node_value)
     9970    0.068    0.000    3.445    0.000 diaphora.py:1150(get_cmp_pseudo_lines)
     9970    0.068    0.000 2297.483    0.230 diaphora.py:1005(save_function_to_database)
   389453    0.067    0.000    0.124    0.000 ida_xref.py:512(get_next_dref_from)
     9968    0.063    0.000    2.619    0.000 kfuzzy.py:246(hash_bytes)
     9968    0.062    0.000    0.064    0.000 kfuzzy.py:218(mix_blocks)
     9970    0.061    0.000  169.442    0.017 diaphora_ida.py:2364(guess_type)
   219833    0.060    0.000    1.625    0.000 kfuzzy.py:31(modsum)
   389453    0.057    0.000    0.057    0.000 {built-in method _ida_xref.get_next_dref_from}
     9968    0.054    0.000    1.475    0.000 diaphora_ida.py:3829(__init__)
   172122    0.054    0.000    0.112    0.000 ida_xref.py:374(crefs_to)
   282424    0.053    0.000    0.529    0.000 ida_bytes.py:4338(get_strlit_contents)
    19940    0.050    0.000    0.050    0.000 {built-in method _hashlib.openssl_md5}
    49850    0.049    0.000    0.177    0.000 diaphora_ida.py:2551(<genexpr>)
    96919    0.047    0.000    0.970    0.000 idc.py:4971(get_type)
   171566    0.046    0.000    0.097    0.000 ida_xref.py:384(fcrefs_to)
     9999    0.043    0.000    0.043    0.000 diaphora_ida.py:1049(clear_pseudo_fields)
    19969    0.042    0.000   31.195    0.002 ida_gdl.py:821(__init__)
    19940    0.042    0.000    0.042    0.000 {built-in method _ida_nalt.get_imagebase}
     9968    0.037    0.000    0.037    0.000 {built-in method _ida_hexrays.new_ctree_visitor_t}
    96919    0.037    0.000    0.924    0.000 ida_typeinf.py:9983(idc_get_type)
    19940    0.037    0.000    0.037    0.000 {method 'join' of 'bytes' objects}
    29910    0.034    0.000    0.073    0.000 ida_gdl.py:854(__iter__)
    94211    0.034    0.000    0.265    0.000 ida_name.py:1141(demangle_name)
    19969    0.033    0.000    0.033    0.000 {built-in method _ida_gdl.qflow_chart_t_swiginit}
    19969    0.032    0.000    0.051    0.000 os.py:674(__getitem__)
    41224    0.031    0.000    0.044    0.000 idautils.py:199(Functions)
     9999    0.031    0.000    0.243    0.000 diaphora_ida.py:2463(get_function_names)
     9970    0.028    0.000    0.028    0.000 {built-in method _ida_funcs.get_func_cmt}
    29939    0.028    0.000    0.102    0.000 idc.py:2864(get_func_attr)
   162152    0.027    0.000    0.027    0.000 {built-in method _ida_xref.get_next_cref_to}
    19969    0.025    0.000   31.152    0.002 ida_gdl.py:626(__init__)
     9970    0.025    0.000    0.052    0.000 idc.py:2208(get_segm_start)
    29939    0.023    0.000    0.053    0.000 idc.py:89(_IDC_GetAttr)
    29940    0.023    0.000    0.023    0.000 {built-in method builtins.hasattr}
   161596    0.022    0.000    0.022    0.000 {built-in method _ida_xref.get_next_fcref_to}
    19938    0.022    0.000    0.043    0.000 ida_hexrays.py:16856(__init__)
    19940    0.022    0.000    0.078    0.000 diaphora_ida.py:3094(get_base_address)
    19936    0.022    0.000   40.416    0.002 ida_hexrays.py:15680(_print)
    19938    0.022    0.000    0.077    0.000 diaphora_ida.py:1018(__init__)
    19938    0.022    0.000    0.022    0.000 {built-in method _ida_hexrays.new_mba_ranges_t}
   240667    0.021    0.000    0.021    0.000 {built-in method _thread.get_ident}
   161596    0.021    0.000    0.044    0.000 ida_xref.py:626(get_next_fcref_to)
   162152    0.021    0.000    0.048    0.000 ida_xref.py:588(get_next_cref_to)
    19938    0.021    0.000    0.055    0.000 ida_hexrays.py:6084(__init__)
     9968    0.020    0.000    0.070    0.000 ida_hexrays.py:17980(__init__)
    19938    0.020    0.000    0.020    0.000 {built-in method _ida_hexrays.new_vd_printer_t}
     9970    0.020    0.000    1.612    0.000 idautils.py:558(GetInstructionList)
    19969    0.019    0.000    0.087    0.000 os.py:771(getenv)
    19938    0.018    0.000  169.032    0.008 ida_hexrays.py:22282(gen_microcode)
     9970    0.018    0.000    0.018    0.000 {built-in method _ida_segment.getseg}
     9970    0.018    0.000    4.724    0.000 ida_hexrays.py:22262(decompile_func)
    19938    0.018    0.000    0.056    0.000 ida_hexrays.py:14923(__init__)
   218975    0.018    0.000    0.018    0.000 {built-in method builtins.chr}
    19938    0.017    0.000    0.017    0.000 {built-in method _ida_hexrays.mba_ranges_t_swiginit}
     9970    0.017    0.000    0.017    0.000 tarjan_sort.py:60(<listcomp>)
    19969    0.017    0.000    0.067    0.000 _collections_abc.py:761(get)
    10000    0.017    0.000  401.251    0.040 ida_kernwin.py:7289(user_cancelled)
    29904    0.017    0.000    0.029    0.000 base64.py:51(b64encode)
    19940    0.016    0.000    0.016    0.000 {method 'hexdigest' of '_hashlib.HASH' objects}
   101392    0.015    0.000    0.015    0.000 {method 'decode' of 'bytes' objects}
     9968    0.015    0.000    0.015    0.000 {built-in method _ida_hexrays.cfuncptr_t_get_pseudocode}
    19938    0.015    0.000    0.036    0.000 ida_hexrays.py:8712(__init__)
     9970    0.015    0.000    0.064    0.000 idc.py:3025(get_func_cmt)
    19938    0.014    0.000    0.014    0.000 {built-in method _ida_hexrays.vd_printer_t_swiginit}
    19940    0.014    0.000    0.056    0.000 ida_nalt.py:3420(get_imagebase)
     9970    0.014    0.000    0.014    0.000 {built-in method _ida_hexrays.cfuncptr_t___deref__}
    19938    0.014    0.000    0.014    0.000 {built-in method _ida_hexrays.hexrays_failure_t_swiginit}
    19938    0.014    0.000    0.014    0.000 {built-in method _ida_hexrays.mlist_t_swiginit}
   190481    0.013    0.000    0.013    0.000 {method 'pop' of 'list' objects}
     9968    0.013    0.000    0.013    0.000 {built-in method _ida_hexrays.ctree_visitor_t_swiginit}
     9968    0.013    0.000    0.013    0.000 {built-in method _ida_pro.strvec_t_size}
    19936    0.012    0.000    0.607    0.000 ida_hexrays.py:15612(build_graph)
    29904    0.012    0.000    0.012    0.000 {built-in method binascii.b2a_base64}
     9970    0.012    0.000    0.012    0.000 {built-in method _ida_segment.is_spec_ea}
     9970    0.012    0.000    0.776    0.000 idc.py:5007(guess_type)
    20016    0.012    0.000    0.042    0.000 idautils.py:81(DataRefsTo)
     9970    0.011    0.000    0.011    0.000 {built-in method _ida_hexrays.init_hexrays_plugin}
   240665    0.011    0.000    0.011    0.000 {method 'close' of 'sqlite3.Cursor' objects}
     9999    0.011    0.000    0.098    0.000 ida_funcs.py:1026(get_func_name)
     9968    0.010    0.000    5.295    0.001 ida_hexrays.py:17993(apply_to)
    49873    0.010    0.000    0.010    0.000 {method 'encode' of 'str' objects}
    19969    0.010    0.000    0.019    0.000 os.py:754(encode)
     9970    0.010    0.000    0.840    0.000 ida_idp.py:5846(ph_get_instruc)
    20016    0.010    0.000    0.023    0.000 ida_xref.py:414(drefs_to)
     9970    0.009    0.000    0.021    0.000 ida_segment.py:678(is_spec_ea)
     9968    0.009    0.000    0.022    0.000 ida_pro.py:1881(size)
     9970    0.009    0.000    0.020    0.000 ida_hexrays.py:4116(init_hexrays_plugin)
    41223    0.008    0.000    0.008    0.000 {built-in method _ida_funcs.get_next_func}
     9970    0.008    0.000    0.764    0.000 ida_typeinf.py:9975(idc_guess_type)
     9970    0.008    0.000    4.738    0.000 diaphora_ida.py:2200(do_decompile)
     9970    0.008    0.000    0.026    0.000 ida_segment.py:1015(getseg)
    19938    0.008    0.000    0.008    0.000 {built-in method _ida_hexrays.new_mlist_t}
     9970    0.007    0.000    0.021    0.000 ida_hexrays.py:2045(__deref__)
     9968    0.007    0.000    0.023    0.000 ida_hexrays.py:2273(get_pseudocode)
    19938    0.007    0.000    0.007    0.000 {built-in method _ida_hexrays.new_hexrays_failure_t}
     9970    0.006    0.000    0.049    0.000 {method 'extend' of 'list' objects}
     9968    0.006    0.000    0.648    0.000 ida_hexrays.py:21738(restore_user_cmts)
     9970    0.006    0.000    0.034    0.000 ida_funcs.py:841(get_func_cmt)
     9970    0.006    0.000    0.006    0.000 {built-in method _ida_xref.get_first_cref_to}
     9970    0.006    0.000    4.730    0.000 ida_hexrays.py:26070(decompile)
       25    0.006    0.000    0.006    0.000 {built-in method _ida_kernwin.replace_wait_box}
    41223    0.005    0.000    0.013    0.000 ida_funcs.py:820(get_next_func)
     9970    0.005    0.000    0.010    0.000 ida_xref.py:575(get_first_cref_to)
    29904    0.004    0.000    0.004    0.000 {method 'strip' of 'bytes' objects}
     9970    0.004    0.000    0.004    0.000 {built-in method _ida_xref.get_first_dref_to}
     9970    0.004    0.000    0.009    0.000 ida_xref.py:525(get_first_dref_to)
     9970    0.004    0.000    0.004    0.000 {built-in method _ida_xref.get_first_fcref_to}
     9970    0.003    0.000    0.007    0.000 ida_xref.py:618(get_first_fcref_to)
      278    0.003    0.000    0.003    0.000 {built-in method _ida_xref.calc_switch_cases}
     9970    0.002    0.000    0.002    0.000 {method 'insert' of 'list' objects}
    10046    0.002    0.000    0.004    0.000 ida_xref.py:535(get_next_dref_to)
    10046    0.002    0.000    0.002    0.000 {built-in method _ida_xref.get_next_dref_to}
     9970    0.002    0.000    0.002    0.000 {method 'keys' of 'dict' objects}
     9970    0.002    0.000    0.002    0.000 graph_hashes.py:70(__init__)
     7925    0.002    0.000    0.002    0.000 {built-in method _ida_pro.int64vec_t___getitem__}
     7925    0.002    0.000    0.004    0.000 ida_pro.py:1326(__getitem__)
     9970    0.001    0.000    0.001    0.000 {method 'remove' of 'list' objects}
     3004    0.001    0.000    0.001    0.000 {built-in method _ida_xref.casevec_t___getitem__}
     9970    0.001    0.000    0.001    0.000 {method 'items' of 'dict' objects}
     3004    0.001    0.000    0.001    0.000 {built-in method _ida_pro.int64vec_t_size}
     3004    0.001    0.000    0.002    0.000 ida_pro.py:1141(size)
     3004    0.001    0.000    0.002    0.000 ida_xref.py:880(__getitem__)
      278    0.000    0.000    0.000    0.000 ida_xref.py:687(size)
      278    0.000    0.000    0.000    0.000 ida_nalt.py:2359(get_jtable_size)
      278    0.000    0.000    0.003    0.000 ida_xref.py:114(calc_switch_cases)
      278    0.000    0.000    0.000    0.000 {built-in method _ida_xref.casevec_t_size}
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
      278    0.000    0.000    0.000    0.000 {built-in method _ida_nalt.switch_info_t_get_jtable_size}
       29    0.000    0.000    0.000    0.000 diaphora_ida.py:145(debug_refresh)
       25    0.000    0.000    0.006    0.000 ida_kernwin.py:7799(replace_wait_box)
        1    0.000    0.000    0.000    0.000 {built-in method posix.stat}
        3    0.000    0.000    0.000    0.000 {method 'execute' of 'sqlite3.Connection' objects}
        1    0.000    0.000    0.000    0.000 {built-in method _ida_kernwin.show_wait_box}
      100    0.000    0.000    0.000    0.000 {built-in method builtins.divmod}
        1    0.000    0.000 3406.712 3406.712 diaphora_ida.py:1230(export)
       32    0.000    0.000    0.000    0.000 types.py:171(__get__)
        4    0.000    0.000    0.000    0.000 {built-in method _ida_kernwin.msg}
        1    0.000    0.000    0.000    0.000 diaphora.py:553(db_close)
        1    0.000    0.000    0.000    0.000 pstats.py:107(__init__)
        1    0.000    0.000    0.000    0.000 {method 'close' of '_io.BufferedWriter' objects}
       26    0.000    0.000    0.000    0.000 {built-in method time.monotonic}
        1    0.000    0.000    0.000    0.000 {built-in method _ida_funcs.get_fchunk}
       32    0.000    0.000    0.000    0.000 re.py:250(compile)
       29    0.000    0.000    0.000    0.000 {built-in method builtins.repr}
        1    0.000    0.000    0.000    0.000 cProfile.py:40(print_stats)
       32    0.000    0.000    0.000    0.000 enum.py:792(value)
        1    0.000    0.000    0.000    0.000 genericpath.py:16(exists)
        1    0.000    0.000    0.000    0.000 pstats.py:117(init)
        1    0.000    0.000    0.000    0.000 cProfile.py:50(create_stats)
        1    0.000    0.000    0.000    0.000 {built-in method _ida_funcs.get_next_fchunk}
        2    0.000    0.000    0.000    0.000 diaphora_ida.py:120(log)
        1    0.000    0.000    0.000    0.000 {method 'close' of 'sqlite3.Connection' objects}
        2    0.000    0.000    0.000    0.000 {built-in method time.asctime}
        1    0.000    0.000    0.000    0.000 diaphora.py:449(__del__)
        4    0.000    0.000    0.000    0.000 ida_kernwin.py:197(msg)
        1    0.000    0.000    0.000    0.000 pstats.py:136(load_stats)
        1    0.000    0.000    0.000    0.000 ida_kernwin.py:926(show_wait_box)
        4    0.000    0.000    0.000    0.000 init.py:76(write)
        1    0.000    0.000    0.000    0.000 ida_funcs.py:1165(get_fchunk)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {method 'commit' of 'sqlite3.Connection' objects}
        1    0.000    0.000    0.000    0.000 ida_funcs.py:1216(get_next_fchunk)
        1    0.000    0.000    0.000    0.000 diaphora.py:468(load_hooks)
        1    0.000    0.000    0.000    0.000 {method '__exit__' of '_io._IOBase' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

[Diaphora: Wed Oct  4 13:48:51 2023] Database exported, time taken: 17:28:48.203620.
lasizoillo commented 9 months ago

TL;DR

Looking to data...

ncalls         tottime      percall  cumtime  percall filename:lineno(function)
12631926 2273.867    0.000    2273.867    0.000 {method 'execute' of 'sqlite3.Cursor' objects}

There are a lot of calls (>12M) consuming only 2k second on that. Is less than a millisecond per call. Anyway there are some steps to try to improve it.

By cumtime I bet for next function to test improvements:

     9970    1.533    0.000 2269.750    0.228 diaphora.py:717(insert_basic_blocks_to_database)
joxeankoret commented 9 months ago

I have made some updates to Diaphora using some tips given from @lasizoillo, not all because some don't make sense here, or aren't possible (executemany() cannot be used easily if one needs the row id of the inserted row), or are already implemented (like the usage of pragmas, although they aren't yet configurable).

I have also found a sample that reproduces the behaviour @turbocool3r is seeing: it can be because of microcode extraction, or because of the LFA algorithm for extracting compilation units, or because of both. Hopefully, now that I can fully reproduce this behaviour, I can have a fix/optimization/workaround/whatever by this week.

joxeankoret commented 9 months ago

And, yeah, it's the code for finding compilation units using LFA (Local Function Affinity). Sometimes, it takes a huge amount of time:

         1020283488 function calls (1016171922 primitive calls) in 8029.762 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1 4653.752 4653.752 4704.008 4704.008 diaphora_ida.py:3112(get_modules_using_lfa)
  5500736 1986.791    0.000 1986.791    0.000 {method 'execute' of 'sqlite3.Cursor' objects}
    27840  107.123    0.004  107.123    0.004 {built-in method _ida_hexrays.gen_microcode}
132769500   96.176    0.000  170.429    0.000 diaphora.py:1145(re_sub)
    27467   83.959    0.003   89.819    0.003 {built-in method _ida_hexrays.mba_t__print}
132769504   74.246    0.000   74.246    0.000 {method 'sub' of 're.Pattern' objects}
  4731013   72.451    0.000  238.874    0.000 diaphora.py:1186(get_cmp_asm)
   512177   69.481    0.000   69.481    0.000 {built-in method _ida_hexrays.mblock_t__print}
    14510   49.103    0.003 1231.924    0.085 diaphora_ida.py:2728(read_function)
    13688   43.014    0.003   43.014    0.003 {built-in method _ida_hexrays.cfuncptr_t_get_pseudocode}
    27467   38.141    0.001  170.044    0.006 diaphora_ida.py:2234(get_microcode_bblocks)
    27840   31.210    0.001  449.062    0.016 diaphora_ida.py:2288(get_microcode)
   157537   30.672    0.000   30.672    0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
  7617553   26.690    0.000   41.403    0.000 diaphora_ida.py:2216(get_plain_microcode_line)
  1009542   22.245    0.000   22.245    0.000 {built-in method _ida_lines.generate_disasm_line}
  1009542   22.130    0.000   93.140    0.000 diaphora_ida.py:2499(extract_function_constants)
 14091007   19.989    0.000   19.989    0.000 {built-in method _ida_xref.xrefblk_t_swiginit}
  8854491   17.381    0.000   17.381    0.000 {method 'split' of 're.Pattern' objects}
    14152   16.518    0.001  209.258    0.015 diaphora_ida.py:2685(extract_microcode)
joxeankoret commented 9 months ago

Just for the record (this is more intended as something I can search for in the future if I need it), I will explain in detail the problem here. In most of the cases IDA doesn't have line information, but when it does, it might give line information for every single assembly line in an IDA database. Diaphora was calling the Hex-Rays API ida_lines.get_sourcefile() for every single assembly line in the database in the hope that at some point it will hit an address with line information. However, with binaries full of line information, it was causing a huge slow down. Now, it's doing it only per basic block.

After optimizing the previously mentioned part I noticed that it was still terribly slow and noticed that the worst part of this performance problem is here: https://github.com/joxeankoret/diaphora/blob/master/diaphora_ida.py#L3158

This is an example partial output shown by a modified version of Diaphora:

[Diaphora: Sat Oct 21 01:35:35 2023] Finding compilation units...
[Diaphora: Sat Oct 21 01:35:35 2023] Running lfa.analyze
[Diaphora: Sat Oct 21 01:35:44 2023] Done
[Diaphora: Sat Oct 21 01:35:44 2023] Finding source files using strings...
Programming languages found:

  C/C++      92.71630727321542%

[Diaphora: Sat Oct 21 01:36:04 2023] Done
[Diaphora: Sat Oct 21 01:36:04 2023] Running the nested fors to assign CU names...
[Diaphora: Sat Oct 21 01:36:07 2023] Done
[Diaphora: Sat Oct 21 01:36:07 2023] Coalescing modules #1
[Diaphora: Sat Oct 21 01:36:07 2023] Coalescing modules #2
[Diaphora: Sat Oct 21 01:36:07 2023] Coalescing modules #3
[Diaphora: Sat Oct 21 01:36:07 2023] Coalescing modules #4
[Diaphora: Sat Oct 21 02:52:35 2023] Done
[Diaphora: Sat Oct 21 02:52:35 2023] Done finding compilation units, saving them...

The "Coalescing modules #4" step was taking almost 1.5 hours! Looking to the code I cannot understand honestly why because I was doing basically this:

for module in modules:
  for func in Functions(module["start"], module["end"]):
    do_stuff()

My suspicion was that calling the Hex-Rays API Functions() was too heavy, even when I was limiting the address ranges it should check, and it turns out I was right. Using an internal functions cache I already had for another matter and changing the nested loops to this format:

for func in self._funcs_cache:
  for module in modules:
    if module["start"] <= func <= module["end"]:
      do_stuff()

...seems to fix this huge performance issue. Check this example output log after the change for the same binary and options:

(...)
[Diaphora: Sat Oct 21 12:05:10 2023] Coalescing modules #4
[Diaphora: Sat Oct 21 12:05:11 2023] Done
[Diaphora: Sat Oct 21 12:05:11 2023] Done finding compilation units, saving them...
(...)

I will run all the tests, isolate this specific patch, and issue a new minor Diaphora version. If anyone wants the full currently in development version tell me.

joxeankoret commented 8 months ago

Finally fixed with release 3.1.1.

turbocool3r commented 8 months ago

It's indeed much faster now, the only thing is I am seeing errors like [Diaphora: Sun Oct 29 21:24:36 2023] Warning: Mnemonic '' not found in the list of microcode instructions! which seemingly weren't there before.

joxeankoret commented 8 months ago

Please, could you share the exact binary causing this problem??

joxeankoret commented 3 months ago

I'm closing this for now as it has been significantly improved in latest releases. Please feel free to reopen if you think it's required.