Closed Coekjan closed 2 months ago
A good first step would be to dump the instructions the dynarec produces: BOX64_DYNAREC_DUMP=2 BOX64_TRACE=0-1 BOX64_TRACE_FILE=dump.txt <program>
, then looking at the problematic addresses (note that the trace build* is quite helpful in this case). Note that BOX64_DYNAREC_TEST
adds quite a lot of clutter, so if you can't see the issue you can try again with that option.
Also, a DUMP
of 2 will add color (ANSI escape codes) to the output (you can read it using less -R
). If you want to open it in a text editor, this will make the file much less readable; in this case, you should instead use a DUMP
of 1
.
*The trace build requires libzydis version v3.2.1: https://github.com/zyantific/zydis/tree/v3.2.1 (newer versions change the API).
@rajdakin Thanks for your hints. I further investigated the code and found these differences only occured when I used BOX64_DYNAREC_TEST=1
. I saw this code:
When I removed the FLAGS_ADJUST_TO11
& FLAGS_ADJUST_FROM11
, python3.12 could run without any dynarec-interpreter differences even if I enabled BOX64_DYNAREC_TEST=1
. Honestly, I don't know why we need FLAGS_ADJUST_*
here, but I just suspected them doing something unexpected to the FLAGS.
A bit of context about what those ADJUST FLAG11 do:
One of the x86 flags OF is on bit 11 of the xFlags. But on RV64, most (all?) operation that use an Immediate have a signed 12bits immediate. That means, if trying to use an immediat to set this bit, the value will get sign extensed (because bit 11 is the sign bit for a 12bit signed value).
To avoid more opcode when setting this bit (that is set/unset often), it's moved to a different place inside RV64 dynarec (one of the lower reserved bit, the F_OF2
macro). And it works fine (and quite fast). The olnly issue is when exchanging the flags register outside of the Dynarec: those ADJUST macro allow to swap/unswap this bit in place.
A bit of context about what those ADJUST FLAG11 do: One of the x86 flags OF is on bit 11 of the xFlags. But on RV64, most (all?) operation that use an Immediate have a signed 12bits immediate. That means, if trying to use an immediat to set this bit, the value will get sign extensed (because bit 11 is the sign bit for a 12bit signed value). To avoid more opcode when setting this bit (that is set/unset often), it's moved to a different place inside RV64 dynarec (one of the lower reserved bit, the
F_OF2
macro). And it works fine (and quite fast). The olnly issue is when exchanging the flags register outside of the Dynarec: those ADJUST macro allow to swap/unswap this bit in place.
Thanks for your attention on this issue. And I now fully understand why we need these macros.
I tried to run python3.12 with box64 on rv64 platform (prior issue #1652 ) and used
BOX64_DYNAREC_TEST=1
to test the differences between dynarec and interpreter. I saw 4 differences aboutcmp
instruction:The relative RIP
0x3f001ea3f3
&0x3f001ea36f
is actually inlibpython3.12.so
andobjdump
told me:I guess this might be because
cmp
did not set x64 FLAGS correctly, thus the followingjle
&jg
did not jump to correct address. But why? I have investigated the code for about 1 day and till now can not find out the reason.I would appreciate it if anyone could help to solve this problem or give some hints about this. Thanks in advance.