unicorn-engine / unicorn

Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, S390x, TriCore, X86)
http://www.unicorn-engine.org
GNU General Public License v2.0
7.47k stars 1.33k forks source link

Multiple issues related to unhooking and rehooking #1640

Closed rhelmot closed 2 years ago

rhelmot commented 2 years ago
  1. https://github.com/unicorn-engine/unicorn/commit/87a391d549a339b5d8109c10c6b16bd95130a923 introduced a bug where, since callbacks and user_data values for HOOK_BLOCK and HOOK_CODE are inlined into generated code, if this hook is removed (in my case it is later replaced with the same callback and a different user_data), the stale generated code will not be flushed and will cause segfaults if e.g. user_data was freed. I have resolved this by adding a tb_flush call to uc_del_inline_hook, but this seems incredibly invasive:
commit a46b4dd1cf71b7f9db49d70a332161a5b37c6702
Author: Audrey Dutcher <audrey@rhelmot.io>
Date:   Thu Jul 7 17:45:22 2022 -0700

    Flush TB cache on hook removal

diff --git a/qemu/tcg/tcg.c b/qemu/tcg/tcg.c
index 4910c67c..5c4bfffd 100644
--- a/qemu/tcg/tcg.c
+++ b/qemu/tcg/tcg.c
@@ -708,6 +708,13 @@ static void uc_free_inline_hook_info(void *p)
 void uc_del_inline_hook(uc_engine *uc, struct hook *hk)
 {
     g_hash_table_remove(uc->tcg_ctx->custom_helper_infos, hk->callback);
+
+    switch (hk->type) {
+        case UC_HOOK_BLOCK:
+        case UC_HOOK_CODE:
+            tb_flush(uc->cpu);
+            break;
+    }
 }

 void tcg_context_init(TCGContext *s)
diff --git a/uc.c b/uc.c
index 6bff4e77..4eed6666 100644
--- a/uc.c
+++ b/uc.c
@@ -753,6 +753,9 @@ uc_err uc_emu_start(uc_engine *uc, uint64_t begin, uint64_t until,
     }
     uc->nested_level++;

+    // remove hooks that were deleted while unicorn was not running
+    clear_deleted_hooks(uc);
+
     switch (uc->arch) {
     default:
         break;

(also included there is a second diff which calls cleared_deleted_hooks at the start of uc_emu_start in addition to at the end. this is necessary to trigger cleanup of hooks that were unhooked not during execution.)

  1. There is a second bug which appeared after I pulled master today which seems to be triggered both with and without the above patch. I have not been able to fully triage it yet, but here is the backtrace for it:
#0  0x00007ffff390c143 in g_hash_table_lookup_node (hash_table=0x21, key=0x7fffffffa860) at /home/audrey/proj/angr/unicorn/glib_compat/glib_compat.c:895
#1  0x00007ffff390c25d in g_hash_table_lookup (hash_table=0x21, key=0x7fffffffa860) at /home/audrey/proj/angr/unicorn/glib_compat/glib_compat.c:951
#2  0x00007ffff39ead54 in hooked_regions_add (h=0x1e37f60, start=134513472, length=2) at /home/audrey/proj/angr/unicorn/include/uc_priv.h:447
#3  0x00007ffff39eae36 in hooked_regions_check_single (cur=0x1e37fa0, start=134513472, length=2) at /home/audrey/proj/angr/unicorn/include/uc_priv.h:460
#4  0x00007ffff39eaea0 in hooked_regions_check (uc=0x1e2c160, start=134513472, length=2) at /home/audrey/proj/angr/unicorn/include/uc_priv.h:471
#5  0x00007ffff39eba56 in translator_loop_x86_64 (ops=0x7ffff4b49fc0 <i386_tr_ops>, db=0x7fffffffa980, cpu=0x1ee2a40, tb=0x7fffaca2f040, max_insns=1) at /home/audrey/proj/angr/unicorn/qemu/accel/tcg/translator.c:150
#6  0x00007ffff39d6c5a in gen_intermediate_code_x86_64 (cpu=0x1ee2a40, tb=0x7fffaca2f040, max_insns=1) at /home/audrey/proj/angr/unicorn/qemu/target/i386/translate.c:9396
#7  0x00007ffff396e93d in tb_gen_code_x86_64 (cpu=0x1ee2a40, pc=134513472, cs_base=0, flags=4194480, cflags=-16711679) at /home/audrey/proj/angr/unicorn/qemu/accel/tcg/translate-all.c:1632
#8  0x00007ffff39579d2 in tb_find (cpu=0x1ee2a40, last_tb=0x0, tb_exit=0, cf_mask=0) at /home/audrey/proj/angr/unicorn/qemu/accel/tcg/cpu-exec.c:256
#9  0x00007ffff39582bf in cpu_exec_x86_64 (uc=0x1e2c160, cpu=0x1ee2a40) at /home/audrey/proj/angr/unicorn/qemu/accel/tcg/cpu-exec.c:597
#10 0x00007ffff391a88f in tcg_cpu_exec (uc=0x1e2c160) at /home/audrey/proj/angr/unicorn/qemu/softmmu/cpus.c:96
#11 0x00007ffff391ab48 in resume_all_vcpus_x86_64 (uc=0x1e2c160) at /home/audrey/proj/angr/unicorn/qemu/softmmu/cpus.c:215
#12 0x00007ffff391abe0 in vm_start_x86_64 (uc=0x1e2c160) at /home/audrey/proj/angr/unicorn/qemu/softmmu/cpus.c:234
#13 0x00007fffedfcc010 in uc_emu_start (uc=0x1e2c160, begin=134513472, until=0, timeout=0, count=0) at /home/audrey/proj/angr/unicorn/uc.c:861
#14 0x00007fffedf72972 in State::start (this=0x1e353f0, pc=134513472, step=100) at sim_unicorn.cpp:122
#15 0x00007fffedf7ddf3 in simunicorn_start (state=0x1e353f0, pc=134513472, step=100) at sim_unicorn.cpp:2734
#16 0x00007ffff6c15ff5 in ?? () from /lib/x86_64-linux-gnu/libffi.so.7
#17 0x00007ffff6c1540a in ?? () from /lib/x86_64-linux-gnu/libffi.so.7
#18 0x00007ffff6c51306 in _ctypes_callproc () from /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so
#19 0x00007ffff6c51ae7 in ?? () from /usr/lib/python3.8/lib-dynload/_ctypes.cpython-38-x86_64-linux-gnu.so

This implies that one of the struct list_item cur iterations in hooked_regions_check_single contains a freed reference to a struct hook.

wtdcode commented 2 years ago

Weird. I recently added a fix to clear outdated hooks' cache. Will have a look asap.

wtdcode commented 2 years ago

btw, could you provide a small PoC?

rhelmot commented 2 years ago

My apologies, it seems there was some issue on my end with respect to build caching. I can no longer reproduce the issue.