Closed egberts closed 2 years ago
yes, please put the testcases under tests/
. if your code is in C, preferred place is under tests/unit
.
thanks.
Unable to procure a working FIX of QEMU engine, but I have a clearly documented unit test C file for this troubleshooting of IMUL aex, mem, Ib" and securing a pass/fail scenario.
$ unzip test_i386_imul_eax_r_ib.c.zip $ gcc test_i386_imul_eax_r_ib.c -lunicorn $ a.out
(Just started refitting my C code for tests/unit style)
I'm staring at the tcg_gen_muls2_i32
function... in GDB, perhaps this is @farmdve's cup of tea because this signed IMUL operation got messed before it stores that corrupted value into the EAX register?
case MO_32:
tcg_gen_trunc_tl_i32(tcg_ctx, cpu_tmp2_i32, *cpu_T[0]);
tcg_gen_trunc_tl_i32(tcg_ctx, cpu_tmp3_i32, *cpu_T[1]);
tcg_gen_muls2_i32(tcg_ctx, cpu_tmp2_i32, cpu_tmp3_i32,
cpu_tmp2_i32, cpu_tmp3_i32);
tcg_gen_extu_i32_tl(tcg_ctx, *cpu_regs[reg], cpu_tmp2_i32);
tcg_gen_sari_i32(tcg_ctx, cpu_tmp2_i32, cpu_tmp2_i32, 31);
tcg_gen_mov_tl(tcg_ctx, cpu_cc_dst, *cpu_regs[reg]);
tcg_gen_sub_i32(tcg_ctx, cpu_tmp2_i32, cpu_tmp2_i32, cpu_tmp3_i32);
tcg_gen_extu_i32_tl(tcg_ctx, cpu_cc_src, cpu_tmp2_i32);
https://github.com/unicorn-engine/unicorn/blob/master/qemu/target-i386/translate.c#L5473
Hard to compare with QEMU of the old...
The problem is not with the instruction decoder. Your code is self-modifying code, it modifies the value the IMUL multiplier from 0x51 to 0x10, usually QEMU handles self-modifying code, so this should work, but for some reason QEMU is using the old value 0x51, instead of 0x10(despite what uc_mem_read returns). Meaning the translation cache is not flushed.
0x5151494a + 0x51 = 0xBAB8306A 0x5151494a + 0x10 = 0x151494A0
There have been other instances of Unicorn failing with self-modifying code, so thanks for finding this.
@aquynh Somewhere we need to flush the translation block.
@egberts, so can you put your testcase under tests/ directory?
Yes.
@aquynh , done.
First attempt was incorrectly added under tests/regress and that 1st pull-request has been closed, 2nd attempt was recoded to fit tests/unit style and 2nd pull-request opened at #380
I traced the issue to the self-modified code check in translate-all.c. Even though the code is modified, this check fails
if (!(tb_end <= start || tb_start >= end)) in tb_invalidate_phys_page_range and the page isn't invalidated.
Me setting a breakpoint at tb_invalidate_phys_page_range()
isn't working for me as that function isn't mapped yet at GDB main()
time... Do we have an awesome .gdbinit
or defines to pass to make.sh
(or something) to help us get to this threaded point faster?
I've had luck just letting a program run once in gdb before trying to add the library breakpoint.
@egberts
The functions have a suffix which depends on the arch. tb_invalidate_phys_page_range_x86_64 for X86 arch, and tb_invalidate_phys_page_range_arm for ARM.
This is the most bothersome bug. I have not looked at it in a while, but it's the first time that self-modifying code is not detected properly and thus TB page/region is not invalidated.
a quick glance, and this testcase seems a bit too complicated. can someone make a minimal testcase, so it is shorter & simpler?
Ummm, possibly... Are you looking for a smaller x86 code footprint (size of buffer)? Or are you looking to keep the test code simpler?
Yes, please make the x86 code as small as possible, by just keeping relevant instructions. Thanks
Will do
OK. Got the RIP (PC) to start exactly at before the self-modifying opcode (0x60000021) by using a snapshot of the register set content and setting these new register values in the test/unit/test_tb_x86.c
test code:
0x021: 30 41 30 xor byte ptr [ecx + 0x30], al # modify immediate operand of imul opcode
0x024: 41 inc ecx
0x025: 6b 41 41 51 imul eax, dword ptr [ecx + 0x41], 0x51
Train your eye at the stack region dump of tests/unit/test_tb_x86.c
output after each instruction
60000020: 50 30 41 30 41 6b 41 41 51 32 41 42 32 42 42 30
until imul
opcode's immediate byte operand got modified at offset 0x60000028
to 0x10
from 0x51
:
60000020: 50 30 41 30 41 6b 41 41 10 32 41 42 32 42 42 30
I wanted to keep the original code sequence for later but fuller unit test of invalidating translation cache...so a C preprocessor define #define RIP_NEXT_TO_THE_SELFMODIFY_OPCODE
is inserted, set and used. Will PR next.
Would it make sense to perform at UC_MEM_WRITE hook routine and do the following:
tb_flush()
If so, we would need API for that, no?
tb_flush()
in UC_MEM_WRITE hook didn't work. (Yeah, I made me a new temporary API).
At the time of executing an already self-modified code, the cpu_ldub_code_x86_64()/glue()
didn't expectedly obtain the modified instructions at disas_insn:4794.
Going back to the modifying instruction to see what it takes to 'push' the modified instructions down to the (perhaps, looking for that cpu_stub_code_x86_64()
equivalent (doesn't exist).
Uses TVGv_i64 scratch memory 0x15 - contains value 0x10 (AL register content) 0x16 - scratch XOR operation 0x17- contains indexing result of [ecx + 0x30]
This snippet in translate.c:1519
case OP_XORL:
tcg_gen-xor_tl(tgc_ctx, *cpu_T[0], *cpu_T[0], *cpu_T[1]); # constructed 0x10
gen_op_st_rm_T0_A0(s, ot, d); # put in 0x10 in memory
gen_op_update1_cc(tcg_ctx);
set_cc_op(s, CC_OP_LOGICB + ot);
break
In gen_op_st_rm_T0_A0(DisasContent, 0, OR_TMP0)
, calls
gen_op_st_v(s, idx=0, t0=0x15, a0=0x17)
, which calls
tcg_gen_qemu_st_tl(s->uc, t0=0x15, addr=0x17, idx=0x2, memop=MO_8)
, which calls
tcg_canonicalize_memop_x86_64(op=MO_8, is64=0x1, st=0x1)
Nothing is invalidating the TCG. It proceeds run TCG unhindered. I was expecting some kind of flush_icache_range()
after passing memory region boundary test or something. (cpu_flush_icache_range()
is being ignored while in TCG-mode).
I think you are looking at the wrong code, however tb_flush should in fact flush the entire cache and retranslate, so...not sure why it isn't.
I flipped the DEBUG_FLUSH define on, and within my hook_write()
, performed the following:
static void
hook_mem32_write(uc_engine *uc, uc_mem_type type, uint64_t address, int size, int64_t value, void *UNUSED(user_data))
{
switch(type) {
default: break;
case UC_MEM_WRITE:
if (0 != uc_mem_check(uc, address, size)) { // bool: True/False
printf("SELF-MODIFYING CODE\n");
uc_tb_flush(uc);
}
}
}
uc_tb_flush()
, which in turns did the following:uc.c
UNICORN_EXPORT
uc_err uc_tb_flush(uc_engine *uc)
{
CPUState *cpu = uc->cpu;
void *env = cpu->env_ptr;
printf("FLUSHED!\n\n");
tb_flush_x86_64(env);
return 0;
}
and got this:
qemu: flush code_size=2240 nb_tbs=1 avg_tb_size=2240
later uc_mem_read()
retrieval of modified memory of its instruction shows this to be correct; however the actual TCG behavior of its immediate AL operand of IMUL still hasn't been updated by the 'self-modifying' xor
instruction.
having the following
#define DEBUG_TB_INVALIDATE
#define DEBUG_TB_CHECK
Both reveal nothing.
Putting a breakpoint on tb_invalidate_phys_page_fast_x86_64()
has been evoked by the following backtrace at point of modifying the immediate operand of IMUL instruction by the XOR instruction:
notdirty_mem_write()
memory_region_write_access_x86_64()
access-with-adjusted_size_x86_64()
memory_region_dispatch_write_x86_64()
io_mem_write_x86_64()
io_writeb_x86_64()
helper_ret_stb_mmu_x86_64()
static_code_gen_buffer()
?? ()
?? ()
?? ()
?? ()
tcg_out_opc()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Looks like I hit something esoteric for it passes valgrind and runs generally fine. (I'm compiling with -g3 -ggdb).
I see in qemu/translate-all.c:1414 that
tb_invalidate_phys_page_range(uc, 0x28, 0x28 + 1, 1);
That's what we want.
Doesn't exceed SMC_BITMAP_USE_THRESHOLD (10)
, so no build_page_bitmap(p)
Going deeper, I see that:
qemu_get_ram_ptr(uc, 0x28)
(of the stb_p(qemu_get_ram_ptr(uc, 0x28))
line) is returning the correct 0x10
Now at
tlb_set_dirty(env, uc->current_cpu->mem_io_vaddr = 0x60000028)
Making a note of all dirty pages.
Physical memory has the correct self-modified imul
operand of 0x10 as evidenced by gdb command line
(gdb) print qemu_get_ram_ptr_x86_64( uc, 0x28)
$99 = 0x51
# stepping over st_p_x86_64()
(gdb) print qemu_get_ram_ptr_x86_64(uc, 0x28)
$100 = 0x10
So, physical memory has the updated info. (not getting far, but it is a lesson for me).
For those who need a QEMU refresher, Fabrice put out this awesome design architecture overview of QEMU.
http://archives.cse.iitd.ernet.in/~sbansal/csl862-virt/2010/readings/bellard.pdf
Excerpt from Fabrice's research paper.
3.6 Self-modifying code and translated code invalidation On most CPUs, self-modifying code is easy to handle because a specific code cache invalidation instruction is executed to signal that code has been modified. It suffices to invalidate the corresponding translated code.
However on CPUs such as the x86, where no instruction cache invalidation is signaled by the application when code is modified, self-modifying code is a special challenge [3]
When translated code is generated for a TB, the corresponding host page is write protected if it is not already read-only. If a write access is made to the page, then QEMU invalidates all the translated code in it and reenables write accesses to it.
Correct translated code invalidation is done efficiently by maintaining a linked list of every translated block contained in a given page. Other linked lists are also maintained to undo direct block chaining.
When using a software MMU, the code invalidation is more efficient: if a given code page is invalidated too often because of write accesses, then a bitmap representing all the code inside the page is built. Every store into that page checks the bitmap to see if the code really needs to be invalidated. It avoids invalidating the code when only data is modified in the page
[3] For simplicity, QEMU will in fact implement this behavior of ignoring the code cache invalidation instructions for all supported CPUs.
Before I go searching for that code fragment that performs the instruction cache invalidation of the Translation Cache, specifically with regard to a write-protected block, and loosening up the "is it a write-protected block" logic, any pointers?
You were close. I even mentioned it a while back where the problem stemmed from. https://github.com/unicorn-engine/unicorn/issues/364#issuecomment-172326332
Your trace even pointed you in the right direction, notdirty_mem_write()
.
Can i say this on GitHub?
-->> #headdesk <<--
Recap: In a stack-based (writable) memory, code modification occurs: an xor
instruction modifies the imul
immediate byte operand (2 instructions down the code space). Fault is that QEMU does not properly regenerates a new translation block using this self-modified target code that got changed by its recent self-modifying code snippet, instead it still recreates a new translation block but using the old host code space, untouched. (meaning that TB is still identical to what it was before).
Starting code sequence:
0x021: 30 41 30 xor byte ptr [ecx + 0x30], al # modify immediate operand of imul opcode
0x024: 41 inc ecx
0x025: 6b 41 41 51 imul eax, dword ptr [ecx + 0x41], 0x51
After PC advanced to 0x024 (and 0x028 got modified), the ideal new code sequence
0x021: 30 41 30 xor byte ptr [ecx + 0x30], al # modify immediate operand of imul opcode
0x024: 41 inc ecx
0x025: 6b 41 41 10 imul eax, dword ptr [ecx + 0x41], 0x10
In comparing two code snippets above, xor
at 0x60000021 overwrite/modifies imul
' 0x51 with a new value of 0x10 at address 0x60000028.
Real Unicorn test_tb_x86.c test run: Looks like this using uc_mem_read() after executing 1st instruction
0x021: 30 41 30 xor byte ptr [ecx + 0x30], al # modify immediate operand of imul opcode
0x024: 41 inc ecx
0x025: 6b 41 41 10 imul eax, dword ptr [ecx + 0x41], 0x10
Performed actual TCG emulation as:
0x021: 30 41 30 xor byte ptr [ecx + 0x30], al # modify immediate operand of imul opcode
0x024: 41 inc ecx
0x025: 6b 41 41 51 imul eax, dword ptr [ecx + 0x41], 0x51
So, uc_mem_region() is using qemu_get_ram_ptr()/RAMBlock for all their data retrieval/store...CORRECTLY.
TCG, not so much.
We'll need to track this code change within the same memory block that caused such an old codespace not to be updated. There are several places in which to store our self-modifying instruction sequence in question (all of which were learned by me in advance so I can direct your attention up front).
Using "awatch", we can then monitor the progression of reading/writing of our 0x51 value(s).... but first, we need to determine those address candidates for debug watching.
For uc_mem_write
buffer, a breakpoint to uc_mem_write
is used to get its _bytes buffer argument and 0x28 offset added
b uc_mem_write
continue
print bytes[0x28] # confirm that the value is 0x51
print &bytes[0x28] # get address of 0x51 value
awatch -l 0x401a98
For the QEMU RAMBlock
, that location initially got written also by uc_mem_write()
, a breakpoint address_space_rw_x86_64
was used to intercept and then 0x28 was added to its value.
b address_space_rw_x86_64
continue
print addr1
awatch -location *0x7fffdd000028
Running this from the beginning of the code, we have a set of data watchpoint hits:
0x401a98
0x7fffdd000028
all before uc_emu_start()
After uc_emu_start()
, initial creation of bytecode sequence via cpu_x86_exec
/cpu_x86_gen_code
/disas_insn
loops begins:
helper_uc_tracecode HOOK_INSN (0x60000021)
helper_ret_ldub_mmu_x86_64
helper_ret_stb_mmu_x86_64 <----- iotlb Bug is here
helper_uc_tracecode HOOK_INSN (0x60000021) # 'xor' opcode
helper_ret_ldub_mmu_x86_64
helper_ret_stb_mmu_x86_64
helper_read_eflags
helper_uc_tracecode HOOK_INSN(0x60000024) # 'inc' opcode
helper_le_ldul_mmu_x86_64
helper_read_eflags
helper_uc_tracecode HOOK_INSN # 'imul' opcode (*ERROR DETECTED*)
Then emulation execution begins out of static_code_gen_buffer[].
At the moment of xor
self-modify step, a new TCG got regenerated, the next awatch
hit came too late...we're already begun the regeneration of its emulation part of 'imul eax,ecx[0x41],0x51'...
I was expecting to see emulation steps of writing an updated value of 0x10
into the correct MemoryRegion location for later pickup for TCG regeneration of the revised imul ...,...,0x10
instruction.
things that invalidate pages (aka tb_invalidate_phys_page_range) due to 'xor' writing 0x10 in place of 0x51 in physical RAM
static_code_gen_buffer
helper_ret_stb_mmu_x86_64 # 'xor'
io_writeb_x86_64
io_mem_write_x86_64
memory_region_dispatch_write_x86_64
access_with_adjusted_size_x86_64 (wrote 0x10 to 0x7fffddabb658) # wrong location
memory_region_write_accessor_x86_64
notdirty_mem_write(uc, opaque=0x0, ram_addr=0x28, val=0x10, size=0x1)
tb_invalidate_phys_page_fast_x86_64
tb_invalidate_phys_page_range_x86_64((uc, start=0x28, end=0x29, is_cpu_write_access=0x1)
p = (PageDesc *) 0x7fffd80175f0 = {first_tb = 0x7fffddabd010, code_write_count = 0x0, code_bitmap = 0x0}
cpu_restore_state_from_tb
tcg_func_start
tcg_pool_reset
memclr(s->free_temps)
tcg_ctx->gen_opc_ptr = in s->gen_opc_buf
tcg_ctx->be = TCGBackendData[640]
gen_intermediate_code_pc_x86_64
x86_env_get_cpu
gen_intermediate_code_internal(gen_opc_cc_op=0x7ffff7e69b7c "", cpu=0x640180, tb=0x7fffddabd010, search_pc=0x1)
gen_tb_start
disas_insn(env, tcg_ctx, pc_start=0x60000021) # `xor`
disas_insn(env, tcg_ctx, pc_start=0x60000024) # `inc`
disas_insn(env, tcg_ctx, pc_start=0x60000025) # `imul`
b = 0x6b = cpu_ldub_code(env, s->pc=0x60000026)
modrm = 0x41 = cpu_ldub_code(env, s->pc=0x60000027)
insn_get(env, s, ot=MO_8)
cpu_ldub_code_x86_64(env, addr=ptr=0x60000028)
page_index = 0
mmu_idx = 0x2
&env->tlb_table[2][0] = (CPUTLBEntry *) 0x64d1f0 = {addr_read = 0x60000000, addr_write = 0x60000010, addr_code = 0x60000000, addend = 0x7fff7d000000, dummy = 0x64d210 '\377' <repeats 200 times>...}
hostaddr = 0x7fffdd000028
*hostaddr = 0x51
ldub_p_x86_64(ptr=0x7fffdd000028)
res = 0x51 # WRONG!
and 0x51 did not get changed into the expected 0x10 immediate byte operand value for imul
opcode.
While still in static_code_gen_buffer[] emulation mode, my shifting focus now is at first disas_insn()
iteration step (the xor
) (also at during the first instruction being emulated 'xor') , but has been evolving:
imul ...,...,0x51
(during cpu_restore_state_from_tb()
, see above calltrace)0x10
value from codespace memory to build the new imul ...,...,0x10
0x10
toward the correct MemoryRegion. <--- currently hereI'm seeing two separate MemoryRegion *mr being used here.... One by iotlb (io_mem) and other by pc.ram. The missing 0x10 write went to the OTHER MemoryRegion block.
it seems you are getting very far. looks forward to the fix for this issue :-)
We have established with previous calltrace that tb_invalidate_phys_page_fast-triggered regeneration of host codes did not pick up the changed operand within its own memory page.
I've narrowed the first faulty code down to exactly one liner helper_ret_stb_mmu_x86_64:722
:
#0 helper_ret_stb_mmu_x86-64(env=0x14f7e070, addr=0x60000028, val=0x10, mmu_idx=0x2, retaddr=0xf5bd08)
at /tmp/unicorn-master/qemu/softmmu_template.h:722
722 if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
The local variables are:
(gdb) info local
index = 0x0
tlb_addr = 0x60000010
haddr = 0x60000000
hook = 0x14f67240
handled = 0x51
uc = 0x14f5c130
mr = 0x14f6ad20
cur = 0x0
It was a mistake to treat stack-based "pc.ram" memory as an IO Memory. Once I faked the tlb_addr
with a forged 0x60000000 (replacing 0x60000010, thereby removing the TLB_NOTDIRTY bit), it worked perfectly in placing the 0x10 over the 0x51 by this line 787 (also of the same routine).
Now, I need to go step back some more instruction as to find the right placement, probably the failure of getting the correct MemoryRegion pointer (getting io-mem-type memory which used the notdirty_mem_ops()
, instead of pc.mem which -> probably <- provides the desired unassigned_mem_ops_x86_64()
).
The forging of tlb_addr resulted in proper placement of punching in the new 0x10 operand into the "pc.mem" memory region. Line 787 and 789 are the only place in the whole QEMU code that back-updates the pc.ram with self-modifying code.
But it didn't fix the whole problem. Stay tuned.
How about that.... env->tlb_table[][].addr_write
is never marked 'dirty' by the removal of its TLB_NOTDIRTY flag.
UPDATED: Found it through an veiled-named and coded function: tlb_set_dirty1()
.
@aquynh, I have to ask this question (not seeking a change here althought).... Unicorn is making a RedHat-labeled "qemu:memory-region" MemoryRegion
structure in `qemu/memory.c:953
Yet, it treats it as an io.mem as denoted in memory_region_init_io:1200
.
Too bad, we can't label memory precisely /rant.
Questions for QEMU whiz: What are the proper logic to determine an io-mem/qemu:memory-region? Is it a fall-thru?
if (unlikely(tlb_addr & ~TARGET_PAGE_MASK))
above logic certainly isn't working for a user-supplied uc_mem_write
("pc.ram") memory region.
But it is what is being used also in the tlb_vaddr_to_host()
.
Perhaps, it was MARKED as an "io.mem" by accident.
Yep, according to include/exec/cpu-defs.h
:
typedef struct CPUTLBEntry {
/* bit TARGET_LONG_BITS to TARGET_PAGE_BITS : virtual address
bit TARGET_PAGE_BITS-1..4 : Nonzero for accesses that should not go directly to ram.
User-supplied "pc.mem" via uc_mem_write
was MARKED as TLB_NOTDIRTY
(bit 4), perhaps by accident? As in tlb_addr = 0x60000010
, where 0x10
part represent the TLB_NOTDIRTY
.
Real question is, are we supposed to be supporting virtual address scheme with the user-supplied uc_mem_write() memory regions?
Using GDB awatch -l env->tlb_addr[2][0].addr_write
command and re-ran the whole thing to see what code segments were being used to set it up, particularly that TLB_NOTDIRTY
bit (0x10). The read/write hits are:
R/W address callstack
write 0xffffffffffffffff uc_open()
write 0xffffffffffffffff uc_open()
write 0xffffffffffffffff uc_mem_map()
read 0xffffffffffffffff qemu_tcg_cpu_thread_fn_x86_64/get_page_addr_code_x86_64/tlb_set_page_x86_64
write 0x 60000000 qemu_tcg_cpu_thread_fn_x86_64/get_page_addr_code_x86_64/cpu_ldub_code_x86_64/helper_ldb_cmmu_x86_64/helper_ret_ldb_cmmu_x86_64/tlb_fill_x86_64/x86_cpu_handle_mmu_fault/tlb_set_page_x86_64
# 1st-time code generation triggered-by-CPU-page-fault
read 0x 60000000 qemu_tcg_cpu_thread_fn_x86_64/tb_find_slow_x86_64/tb_gen_code_x86_64/tb_link_page_x86_64/tb_alloc_page_x86_64/tlb_protect_code_x86_64/cpu_physical_memory_reset_dirty_x86_64/cpu_tlb_reset_dirty_all_x86_64/tlb_reset_dirty_range_x86_64/tlb_is_dirty_ram_x86_64/
# Marked as TLB_NOTDIRTY after uc_emu_start()-time via start_thread()
write 0x 60000010 qemu_tcg_cpu_thread_fn_x86_64/tcg_exec_all_x86_64/tcg_cpu_exec_x86_64/cpu_x86_exec/tb_find_fast_x86_64/tb_find_slow_x86_64/tb_gen_code_x86_64/tb_link_page_x86_64/tb_alloc_page_x86_64/tlb_protect_code_x86_64/cpu_physical_memory_reset_dirty_x86_64/tlb_reset_dirty_range_all_x86_64/cpu_tlb_reset_dirty_all_x86_64/tlb_reset_dirty_range_x86_64
Maybe QEMU forgot to use (tlb_entry->addr_write & TARGET_PAGE_MASK), let's see what the next awatch hit shows:
read 0x 60000010 helper_ret_stb_mmu_x86_64
Seeing that 0x10 in 0x60000010, this means TLB_NOTDIRTY is set and that "io.mem" MemoryRegion is to be used during its actual TB microcode emulation at the 'xor' instruction. I was half expecting "pc.ram" and not seeing that TLB_NOTDIRTY (0x10).
Too bad, the helper_ret_stb_mmu
cannot make that distinction of what kind of memory it should be (yet). This means, it has to be determine way before the new emulation is being done.
Checkpoint, still at the right spot:
helper_ret_stb_mmu_x86_64 (env=0x14f7e070, addr=0x60000028, val=0x10, mmu_idx=0x2, retaddr=0xf5bd08)
at /tmp/unicorn-master/qemu/softmmu_template.h:703
Looks like there isn't much of a "write-through" mechanism for updating "io.mem/qemu:memory-region" as well as "pc.ram" MemoryRegion.
New code logic, it is going to be.
@aquynh, I have to ask this question (not seeking a change here althought).... Unicorn is making a RedHat-labeled "qemu:memory-region" MemoryRegion structure in `qemu/memory.c:953
Yet, it treats it as an io.mem as denoted in memory_region_init_io:1200
good question, but i am also confused here. the high level overview of Qemu internals is sorely lacking, and sometimes reading source code is not enough to get the whole picture.
you might want to post this question to the Qemu mailing list to get the quick answer.
I think the current memory stack design looks like this: TARGET VIRTUAL MEMORY (0x600000028) "pc.ram" TLB MEMORY (TCG-view) (0x028) "qemu:memory-region" / "io.mem" iotlb_to_region( ioaddr=0xffffffffa00001 ) HOST VIRTUAL (REAL RAM) ADDRESS (0x7fffdd0000000028)
At the moment, the current design APPEARS to me that the "io.mem" is "writeable" but not backflushable to the "pc.ram"
Until I understand the distinction between "pc.ram" and "io.mem" MemoryRegion concept with regard to Unicorn executive function, I am not going to be blazing forward with a fix here anytime soon.
@egberts
Are you sure force flushing the translation block does not fix the issue? It should re-translate the new code.
Oh, it does do a tb_invalidate_phys_page_fast()
in the notdirty_mem_write()
at the time of emulating the xor
self-modifying, just that the during the regeneration of TranslationBlock, it picked up the old 0x51 immediate operand byte value from "pc.mem" as evidenced by this sequence in qemu/exec.c:1378
.
As you can see the sequence, the new 0x10 immediate operand byte value got written AFTER the tb_invalidate_phys_page_fast()
, hence the retrieval of old 0x51 by the TCG before.
Thanks, @farmdve. I've since summarized it and posted on qemu-devel.
No, it's not a simple matter of switching logic around in notdirty_mem_write()
. There's multiple invalidation interactions in play there.
During TranslationBlock regeneration (cpu_restore_state_from_tb()
), a 3rd breakpoint at disasm_insn()
(to where imul
is at, needs to retrieve the immediate byte operand in question), shows that it directly lifted the 0x51 value from "pc.mem" via host memory pointer (0x7fffdc00000028).
After regeneration, then it stores the new 0x10 value into that same "pc.mem" location.
@farmdve, here's what Paolo Bonzini (of QEMU) said the following:
Newer versions (I think 2.5) have dispensed with code regeneration altogether.
OK... perhaps, it's time to move forward to 2.5 (gonna spend a little time on getting redundancies into current QEMU 2.2)
@aquynh, about that 2.5 branch...
Yeah, the 2.5 branch sounds like a better start, buuut it won't be easy at all.
Furthermore, if that does happen we need to think of a better way to separate Unicorn and QEMU, as to be able to port fixes more easily.
we need to think of a better way to separate Unicorn and QEMU, as to be able to port fixes more easily.
I think the only way to do this is to get the necessary hooks for Unicorn (or the likes) in the mainline Qemu code.
Me think the bug still exist in the QEMU 2.5 (exec.c is largely unchanged).
@egberts i am working on 2.5 sync, but that is slow. i believe you are much closer to the solution for this issue than waiting for my work to be done. i can imagine that we release v1.0 without this sync.
@farmdve @JonathonReinhart perhaps it is possible to work towards merging a part of Unicorn to Qemu, but that is also a lot of efforts. many parts of Unicorn are not really inline with Qemu, so we must still maintain our fork.
In Intel IMUL opcode described in one method using different descriptions:
OPC_IMUL_GvEvIb
imul eax, [ecx+0x41], 0x10
6b414110 imul eax,DWORD PTR [ecx+0x41],0x10
When encountering an IMUL opcode of i386 architecture, specifically when using with an immediate 8-bit multiplier, the emulator engine does not properly multiply the number 0x5151494a by 0x10 and get the expected 0x151494a0 result.
Interesting thing, is that a standalone IMUL instruction works, but this stack-based AlphaMixed code snippet (derived from Metasploit) failed.
Will submit working proof-of-failure code soon.