Closed Da1L8-X closed 1 month ago
Mmm, yeah, your are correct. I need to check that. I guess the unwind there doesn't detect that YMM1 is new and fetched at this instruction. Q8 should be "purged" at the JZ. Issue is probably because only the upper part of YMM1 is new, the lower part is already in the cache.
(I edited your post to put the logs in code marks, for readability)
As an additional point, before vaddps is a vxorps, only the low level is cleared, but according to x86 semantics, the high level is also cleared, however there is no register reading operation in box64. I don't know if it needs to be modified here.
8f6081: c5 f0 57 c9 vxorps %xmm1,%xmm1,%xmm1
8f6085: 0f 1f 00 nopl (%rax)
8f6088: c5 f4 58 08 vaddps (%rax),%ymm1,%ymm1
0x1008f6081: C5 F0 57 C9 VXORPS Gx, Vx, Ex
0xffff86332890: 2 emitted opcodes, inst=9, barrier=0 state=0/1(1), set=0/0, use=0, need=0/0, sm=0(0/0), pred=8, last_ip=0x1008f6060 Q1:XMM1 (Change: V1:->XMM1) ymm0=(0000/0000+0002-0000=0002)
3dc02c01 LDR Q1, [xEmu, 0xb0]
6e211c21 VEOR Q1, Q1, Q1
New Instruction x64:0x1008f6085, native:0xffff86332898
0x1008f6085: 0F 1F 00 NOP (multibyte)
0xffff86332898: 2 emitted opcodes, inst=10, barrier=0 state=0/1(1), set=0/0, use=0, need=0/0, sm=0(0/0), pred=9, last_ip=0x1008f6060 Q1:XMM1 ymm0=(0002/0002+0000-0000=0002)
Purge YMM mask=0002 --------
91068001 ADD x1, xEmu, 0x1a0
a9017c3f STP xZR, xZR, [x1, 0x10]
---------- Purge YMM
As an additional point, before vaddps is a vxorps, only the low level is cleared, but according to x86 semantics, the high level is also cleared, however there is no register reading operation in box64. I don't know if it needs to be modified here.
8f6081: c5 f0 57 c9 vxorps %xmm1,%xmm1,%xmm1 8f6085: 0f 1f 00 nopl (%rax) 8f6088: c5 f4 58 08 vaddps (%rax),%ymm1,%ymm1
0x1008f6081: C5 F0 57 C9 VXORPS Gx, Vx, Ex 0xffff86332890: 2 emitted opcodes, inst=9, barrier=0 state=0/1(1), set=0/0, use=0, need=0/0, sm=0(0/0), pred=8, last_ip=0x1008f6060 Q1:XMM1 (Change: V1:->XMM1) ymm0=(0000/0000+0002-0000=0002) 3dc02c01 LDR Q1, [xEmu, 0xb0] 6e211c21 VEOR Q1, Q1, Q1 New Instruction x64:0x1008f6085, native:0xffff86332898 0x1008f6085: 0F 1F 00 NOP (multibyte) 0xffff86332898: 2 emitted opcodes, inst=10, barrier=0 state=0/1(1), set=0/0, use=0, need=0/0, sm=0(0/0), pred=9, last_ip=0x1008f6060 Q1:XMM1 ymm0=(0002/0002+0000-0000=0002) Purge YMM mask=0002 -------- 91068001 ADD x1, xEmu, 0x1a0 a9017c3f STP xZR, xZR, [x1, 0x10] ---------- Purge YMM
This part is correctly handled.
High part are often zero'd, so box64 use a cache system to not do that all the time It's the Ymm0
part of the state. It's a simple mask, to know wich of the 16 upper ymm part is zero.
And as you can see, the next opcode "purge" this information to memory with the STP xZR, xZR...
.
I pushed something, can you check if it solved the issue on your side?
yes, it works, thx!
And thx for the ticket, that was some good debug info :D !
When I try to translate the following code:
I get the result:
So I want to say: if JNZ dont store ymm before branch back, it will get Q8 from origin address again!