ptitSeb / box64

Box64 - Linux Userspace x86_64 Emulator with a twist, targeted at ARM64 Linux devices
https://box86.org
MIT License
3.73k stars 267 forks source link

Unexpected Dynarec-Interpreter Difference on `cmp` Instruction #1661

Closed Coekjan closed 2 months ago

Coekjan commented 2 months ago

I tried to run python3.12 with box64 on rv64 platform (prior issue #1652 ) and used BOX64_DYNAREC_TEST=1 to test the differences between dynarec and interpreter. I saw 4 differences about cmp instruction:

Warning, difference between x64 Interpreter and Dynarec in 0x3f001ea3f3 (7e 17 83 fe 63 7f 12 89)
=======================================
DIFF: Dynarec |  Interpreter
----------------------
RIP: 0000003f001ea40c | 0000003f001ea3f5
Warning, difference between x64 Interpreter and Dynarec in 0x3f001ea36f (0f 8f bb 00 00 00 39 c3)
=======================================
DIFF: Dynarec |  Interpreter
----------------------
RIP: 0000003f001ea430 | 0000003f001ea375
Warning, difference between x64 Interpreter and Dynarec in 0x3f001ea3f3 (7e 17 83 fe 63 7f 12 89)
=======================================
DIFF: Dynarec |  Interpreter
----------------------
RIP: 0000003f001ea40c | 0000003f001ea3f5
Warning, difference between x64 Interpreter and Dynarec in 0x3f001ea36f (0f 8f bb 00 00 00 39 c3)
=======================================
DIFF: Dynarec |  Interpreter
----------------------
RIP: 0000003f001ea430 | 0000003f001ea375

The relative RIP 0x3f001ea3f3 & 0x3f001ea36f is actually in libpython3.12.so and objdump told me:

  (( omitted ))
  1ea369:   8b 41 3c                mov    0x3c(%rcx),%eax
  1ea36c:   83 f8 63                cmp    $0x63,%eax
  1ea36f:   0f 8f bb 00 00 00       jg     1ea430 <PyDict_Clear@@Base+0x3d0>
  1ea375:   39 c3                   cmp    %eax,%ebx
  (( omitted ))
  1ea3ed:   44 0f 4c e0             cmovl  %eax,%r12d
  1ea3f1:   39 f0                   cmp    %esi,%eax
  1ea3f3:   7e 17                   jle    1ea40c <PyDict_Clear@@Base+0x3ac>
  1ea3f5:   83 fe 63                cmp    $0x63,%esi
  (( omitted ))

I guess this might be because cmp did not set x64 FLAGS correctly, thus the following jle & jg did not jump to correct address. But why? I have investigated the code for about 1 day and till now can not find out the reason.

I would appreciate it if anyone could help to solve this problem or give some hints about this. Thanks in advance.

rajdakin commented 2 months ago

A good first step would be to dump the instructions the dynarec produces: BOX64_DYNAREC_DUMP=2 BOX64_TRACE=0-1 BOX64_TRACE_FILE=dump.txt <program>, then looking at the problematic addresses (note that the trace build* is quite helpful in this case). Note that BOX64_DYNAREC_TEST adds quite a lot of clutter, so if you can't see the issue you can try again with that option. Also, a DUMP of 2 will add color (ANSI escape codes) to the output (you can read it using less -R). If you want to open it in a text editor, this will make the file much less readable; in this case, you should instead use a DUMP of 1.

*The trace build requires libzydis version v3.2.1: https://github.com/zyantific/zydis/tree/v3.2.1 (newer versions change the API).

Coekjan commented 2 months ago

@rajdakin Thanks for your hints. I further investigated the code and found these differences only occured when I used BOX64_DYNAREC_TEST=1. I saw this code:

https://github.com/ptitSeb/box64/blob/62695ceed4982c3fdf379b65cc4481c3656cd6ac/src/dynarec/rv64/dynarec_rv64_private.h#L170-L178

When I removed the FLAGS_ADJUST_TO11 & FLAGS_ADJUST_FROM11, python3.12 could run without any dynarec-interpreter differences even if I enabled BOX64_DYNAREC_TEST=1. Honestly, I don't know why we need FLAGS_ADJUST_* here, but I just suspected them doing something unexpected to the FLAGS.

ptitSeb commented 2 months ago

A bit of context about what those ADJUST FLAG11 do: One of the x86 flags OF is on bit 11 of the xFlags. But on RV64, most (all?) operation that use an Immediate have a signed 12bits immediate. That means, if trying to use an immediat to set this bit, the value will get sign extensed (because bit 11 is the sign bit for a 12bit signed value). To avoid more opcode when setting this bit (that is set/unset often), it's moved to a different place inside RV64 dynarec (one of the lower reserved bit, the F_OF2 macro). And it works fine (and quite fast). The olnly issue is when exchanging the flags register outside of the Dynarec: those ADJUST macro allow to swap/unswap this bit in place.

Coekjan commented 2 months ago

A bit of context about what those ADJUST FLAG11 do: One of the x86 flags OF is on bit 11 of the xFlags. But on RV64, most (all?) operation that use an Immediate have a signed 12bits immediate. That means, if trying to use an immediat to set this bit, the value will get sign extensed (because bit 11 is the sign bit for a 12bit signed value). To avoid more opcode when setting this bit (that is set/unset often), it's moved to a different place inside RV64 dynarec (one of the lower reserved bit, the F_OF2 macro). And it works fine (and quite fast). The olnly issue is when exchanging the flags register outside of the Dynarec: those ADJUST macro allow to swap/unswap this bit in place.

Thanks for your attention on this issue. And I now fully understand why we need these macros.