Open huangshaobo opened 1 year ago
That's an AVX opcode. It's not supported yet in box64.
Is there any hope of support in the future?
can you give some technical support, so that we can deal with this issue more easier ?
Support for AVX will be added in a future version. I don't even know if this opcode comes from a lib or the program you want to emulate itself, so there isn't much I can add here.
this is from a app , cajViewer.
What more information should I give you?
I parsed this exception code with zydis tools, but I can't understand the instruction description. Does this help you?
I know what the opcode is. As I wrote before, AVX is not supported for now. It will be in the future, but not now.
Thanks,do you have roadmap for this function? I'll spend some time on this as well, to help develop this function.
Short term objective is to get the next stable version out, so I'm checking I don't have introduce regression and that new functionnalities are working fine.
Then for the next developement cycle, I first need to add support for SSE 4.2 and then I will work on AVX.
It's planned, it's just not for now as there are other things that must comes before.
Dear ptitSeb We add the demo code of the AVX instruction,Can you help to review whether the modification is correct, The diff patch is attached,thank you! avx_demo.zip
File attached is file not found.
Dear ptiSeb We added the attachment again,looking forward to your reply,thanks!
download link: https://github.com/ptitSeb/box64/files/12367258/avx_demo.zip
Ok, thanks.
It's interresting. Looks like it"s on the right path. But do note that this change will break SSE all Dynarec code, that expect XMM register to be 128bits. I am still unsure on how to approach this: using what you introduced, widening xmm regs to 256bits, or leaving xmm as 128bits, and introducing ymm regs for the upper part. This alternative methode might slow down interpretor, but might help dynarec. Off cource, changing width of xmm in the dynarec should be fairly trivial, with some special cases like the FXSAVE stuff...
Also, before AVX, SSE4.2 needs to be implemented.
Dear ptitSeb
Thank you for your reply
https://github.com/ptitSeb/box64/commit/5a52922cd2b9033f6f38ea1cbc78058cf6780cf5 In the URL above, I see the function of SSE4.1 has been added to the code . All instructions of SSE4.1 are supported, right? ?
I should add these 7 instructions in SSE4.2, right? PCMPESTRI/PCMPESTRM/PCMPISTRI/PCMPISTRM/PCMPGTQ/CRC32/POPCNT
How to add support for sse4.2 and AVX in dynarec, can you give guidance, such as some papers or specs; or I will look at the dynamic translation code first?
Looking forward to your reply, thanks !
I don't think CRC32 and POPCNT are part of SSE4.2. I think there use another bit. Also POPCNT is already defined.
For SSE4.2, the Dynarec code will be very tricky to write. I haven't wrote any paper on how to write the Dynarec code :( sorry. You need to look at existing code and add it. Focus on ARM64 first, RV64 will be easier because RVV is not supported yet so the search loops will be unrolled.
(note that I plan to probably start working on SSE4.2 next week, something like that)
Also, note that for SSE4.2, changing the Dynarec is not mendatory first, because it will not change any structures (but it will need to be done to avoid loosing to much performances on SSE4.2 programs)
I also have some AVX opcode err when I use box64 recently. I know that Dynarec code is much faster than interpretor, so I readed the Dynarec code, It's complicated for me. I noticed that SSE translation use the SIMD & FP register Qx. I'm a programmer, but I'm not familiar with the ARM architecture, so I readed the "Arm® Architecture Reference Manual", it shows the Qx register is a 128bit vetcor register which is ok for sse. But for AVX, Ymm is 256bit. So can Qx used for AVX translation? or we must use SVE register Zx. I use box64 on my phone, the cpu dosen't has SVE feature.
I haven't starting designing AVX for the Dynarec yet. My plan is to make SVE optional, and have a (slower) fallback mecanism using NEON on ARM64, and so using a 2nd Qx register for the high part of the YMM regs.
Ok, thanks.
It's interresting. Looks like it"s on the right path. But do note that this change will break SSE all Dynarec code, that expect XMM register to be 128bits. I am still unsure on how to approach this: using what you introduced, widening xmm regs to 256bits, or leaving xmm as 128bits, and introducing ymm regs for the upper part. This alternative methode might slow down interpretor, but might help dynarec. Off cource, changing width of xmm in the dynarec should be fairly trivial, with some special cases like the FXSAVE stuff...
Also, before AVX, SSE4.2 needs to be implemented.
the SSE Dynarec code used the XMM register in structer x64emu_t too, if leave the xmm as 128bits, and introducing ymm regs as upper part. the AVX Dynarec code will use Qx as Ymmx_lower and Qx+16 as YMMx_upper. like the pseudocode shows below:
typedef struct x64emu_s { // cpu reg64_t regs[16]; x64flags_t eflags; reg64_t ip; //sse & avx_low128 union { sse_regs_t xmm[16]; sse_regs_t ymml[16]; //ymm lower 128bits }; sse_regs_t ymmh[16]; //ymm higher 128bits, don't insert item between ymmh and ymml; ... } x64emu_t;
function() { // ldr Qx YMMxl VLDR128_U12(num_qx, xEmu, offsetof(x64emu_t, ymml[x])); //ymml same as xmm //ldr Qx+16 YMMxh VLDR128_U12(num_qx + 16, xEmu, offsetof(x64emu_t, ymmh[x])); }
Yeah, that gloably the idea, except I cannot use Qx+16 for the higher part and will need to have some dynamic alocation for that high part. XMM regs are oso used for x87/mmx regs, and I needs some free for some intermediary code sometime. But anyway, this i what I'm leaning too for now. Actual implementation will not start before SSE 4.2 is over anyway (because AVX imply all SSE extentions are supported), and this hasn't started yet (but should soon)
Have you started the SSE 4.2 coding? I'll learn SSE 4.2 instruction, if you have implemented a instruction, mayby I can do some coding work depends on it. Anyway, if there is anyting I can do to make SSE 4.2 and AVX done faster, I'll be glad to do.
Have you started the SSE 4.2 coding? I'll learn SSE 4.2 instruction, if you have implemented a instruction, mayby I can do some coding work depends on it. Anyway, if there is anyting I can do to make SSE 4.2 and AVX done faster, I'll be glad to do.
Not yet. End of week / next week probably (hopefully)
ok, get it. Look forward to it.
sorry to borther you again, as you said Ymm will need to have some dynamic allocation for that high part. After I learn the Dynarec code more detailed, I noticed that there is only 32 NEON registers, and have used 24, the remaining 8 registers are not enough for ymm(and they are useed as scratch regs). As you said it will need to dynamic allocation, How will you implement this feature. can I just use x87_purgecache
and mmx_purgecache
to store these cache in x64emu_t, and emulat the ymm instruction with all 32 NEON registers, then load x87 and mmx cache again after emulation for the ymm instructuons?
If it's not the right way. what's your plan after SSE4.2,Will you implement the AVX static translation(interpretor) or dynamic translation first. Will it help if I could do some interpretor development, to let you have more time on Dynarec development.
Yes, the idea will be to hopefully use x87/mmx regs and, but still have some kind of dynamic purge/alloc of regs for YMM / x87 / mmx available.
Once SSE4.2 is done, I probably start both interepretor and dynarec at the same time, just to setup the basic infrastructure stuffs, like handling of the new YMM reg size, VEX fetching and dispatching... this kind of things. The interpretor code is always helpfull, because it's some kind of reference. Also, some test (so some x86_64 program, using assembly or interpreter) will also be nice to have, especialy at the beggining.