box64 lib/libicudata.so.55 27754|0x1023b9a74: Unimplemented Opcode error

huangshaobo commented 1 year ago

ptitSeb commented 1 year ago

That's an AVX opcode. It's not supported yet in box64.

huangshaobo commented 1 year ago

Is there any hope of support in the future?

huangshaobo commented 1 year ago

can you give some technical support, so that we can deal with this issue more easier ？

ptitSeb commented 1 year ago

Support for AVX will be added in a future version. I don't even know if this opcode comes from a lib or the program you want to emulate itself, so there isn't much I can add here.

huangshaobo commented 1 year ago

this is from a app , cajViewer.

huangshaobo commented 1 year ago

What more information should I give you?

huangshaobo commented 1 year ago

I parsed this exception code with zydis tools, but I can't understand the instruction description. Does this help you?

ptitSeb commented 1 year ago

I know what the opcode is. As I wrote before, AVX is not supported for now. It will be in the future, but not now.

huangshaobo commented 1 year ago

Thanks，do you have roadmap for this function？ I'll spend some time on this as well, to help develop this function.

ptitSeb commented 1 year ago

Short term objective is to get the next stable version out, so I'm checking I don't have introduce regression and that new functionnalities are working fine.

Then for the next developement cycle, I first need to add support for SSE 4.2 and then I will work on AVX.

It's planned, it's just not for now as there are other things that must comes before.

huangshaobo commented 1 year ago

Dear ptitSeb We add the demo code of the AVX instruction,Can you help to review whether the modification is correct, The diff patch is attached，thank you！ avx_demo.zip

ptitSeb commented 1 year ago

File attached is file not found.

huangshaobo commented 1 year ago

avx_demo.zip

Dear ptiSeb We added the attachment again，looking forward to your reply，thanks！

download link： https://github.com/ptitSeb/box64/files/12367258/avx_demo.zip

ptitSeb commented 1 year ago

Ok, thanks.

It's interresting. Looks like it"s on the right path. But do note that this change will break SSE all Dynarec code, that expect XMM register to be 128bits. I am still unsure on how to approach this: using what you introduced, widening xmm regs to 256bits, or leaving xmm as 128bits, and introducing ymm regs for the upper part. This alternative methode might slow down interpretor, but might help dynarec. Off cource, changing width of xmm in the dynarec should be fairly trivial, with some special cases like the FXSAVE stuff...

Also, before AVX, SSE4.2 needs to be implemented.

huangshaobo commented 1 year ago

Dear ptitSeb

Thank you for your reply

https://github.com/ptitSeb/box64/commit/5a52922cd2b9033f6f38ea1cbc78058cf6780cf5 In the URL above， I see the function of SSE4.1 has been added to the code . All instructions of SSE4.1 are supported, right? ?
I should add these 7 instructions in SSE4.2, right? PCMPESTRI/PCMPESTRM/PCMPISTRI/PCMPISTRM/PCMPGTQ/CRC32/POPCNT
How to add support for sse4.2 and AVX in dynarec, can you give guidance, such as some papers or specs; or I will look at the dynamic translation code first?

Looking forward to your reply, thanks !

ptitSeb commented 1 year ago

I don't think CRC32 and POPCNT are part of SSE4.2. I think there use another bit. Also POPCNT is already defined.

For SSE4.2, the Dynarec code will be very tricky to write. I haven't wrote any paper on how to write the Dynarec code :( sorry. You need to look at existing code and add it. Focus on ARM64 first, RV64 will be easier because RVV is not supported yet so the search loops will be unrolled.

(note that I plan to probably start working on SSE4.2 next week, something like that)

ptitSeb commented 1 year ago

Also, note that for SSE4.2, changing the Dynarec is not mendatory first, because it will not change any structures (but it will need to be done to avoid loosing to much performances on SSE4.2 programs)

zymyy789 commented 1 year ago

I also have some AVX opcode err when I use box64 recently. I know that Dynarec code is much faster than interpretor, so I readed the Dynarec code, It's complicated for me. I noticed that SSE translation use the SIMD & FP register Qx. I'm a programmer, but I'm not familiar with the ARM architecture, so I readed the "Arm® Architecture Reference Manual", it shows the Qx register is a 128bit vetcor register which is ok for sse. But for AVX, Ymm is 256bit. So can Qx used for AVX translation? or we must use SVE register Zx. I use box64 on my phone, the cpu dosen't has SVE feature.

ptitSeb commented 1 year ago

I haven't starting designing AVX for the Dynarec yet. My plan is to make SVE optional, and have a (slower) fallback mecanism using NEON on ARM64, and so using a 2nd Qx register for the high part of the YMM regs.

zymyy789 commented 1 year ago

Ok, thanks.

It's interresting. Looks like it"s on the right path. But do note that this change will break SSE all Dynarec code, that expect XMM register to be 128bits. I am still unsure on how to approach this: using what you introduced, widening xmm regs to 256bits, or leaving xmm as 128bits, and introducing ymm regs for the upper part. This alternative methode might slow down interpretor, but might help dynarec. Off cource, changing width of xmm in the dynarec should be fairly trivial, with some special cases like the FXSAVE stuff...

Also, before AVX, SSE4.2 needs to be implemented.

the SSE Dynarec code used the XMM register in structer x64emu_t too, if leave the xmm as 128bits, and introducing ymm regs as upper part. the AVX Dynarec code will use Qx as Ymmx_lower and Qx+16 as YMMx_upper. like the pseudocode shows below:

typedef struct x64emu_s { // cpu reg64_t regs[16]; x64flags_t eflags; reg64_t ip; //sse & avx_low128 union { sse_regs_t xmm[16]; sse_regs_t ymml[16]; //ymm lower 128bits }; sse_regs_t ymmh[16]; //ymm higher 128bits, don't insert item between ymmh and ymml; ... } x64emu_t;

function() { // ldr Qx YMMxl VLDR128_U12(num_qx, xEmu, offsetof(x64emu_t, ymml[x])); //ymml same as xmm //ldr Qx+16 YMMxh VLDR128_U12(num_qx + 16, xEmu, offsetof(x64emu_t, ymmh[x])); }

ptitSeb commented 1 year ago

Yeah, that gloably the idea, except I cannot use Qx+16 for the higher part and will need to have some dynamic alocation for that high part. XMM regs are oso used for x87/mmx regs, and I needs some free for some intermediary code sometime. But anyway, this i what I'm leaning too for now. Actual implementation will not start before SSE 4.2 is over anyway (because AVX imply all SSE extentions are supported), and this hasn't started yet (but should soon)

zymyy789 commented 1 year ago

Have you started the SSE 4.2 coding? I'll learn SSE 4.2 instruction, if you have implemented a instruction, mayby I can do some coding work depends on it. Anyway, if there is anyting I can do to make SSE 4.2 and AVX done faster, I'll be glad to do.

ptitSeb commented 1 year ago

Have you started the SSE 4.2 coding? I'll learn SSE 4.2 instruction, if you have implemented a instruction, mayby I can do some coding work depends on it. Anyway, if there is anyting I can do to make SSE 4.2 and AVX done faster, I'll be glad to do.

Not yet. End of week / next week probably (hopefully)

zymyy789 commented 1 year ago

ok, get it. Look forward to it.

zymyy789 commented 1 year ago

sorry to borther you again, as you said Ymm will need to have some dynamic allocation for that high part. After I learn the Dynarec code more detailed, I noticed that there is only 32 NEON registers, and have used 24, the remaining 8 registers are not enough for ymm(and they are useed as scratch regs). As you said it will need to dynamic allocation, How will you implement this feature. can I just use x87_purgecacheand mmx_purgecache to store these cache in x64emu_t, and emulat the ymm instruction with all 32 NEON registers, then load x87 and mmx cache again after emulation for the ymm instructuons?

If it's not the right way. what's your plan after SSE4.2,Will you implement the AVX static translation(interpretor) or dynamic translation first. Will it help if I could do some interpretor development, to let you have more time on Dynarec development.

ptitSeb commented 1 year ago

Yes, the idea will be to hopefully use x87/mmx regs and, but still have some kind of dynamic purge/alloc of regs for YMM / x87 / mmx available.

Once SSE4.2 is done, I probably start both interepretor and dynarec at the same time, just to setup the basic infrastructure stuffs, like handling of the new YMM reg size, VEX fetching and dispatching... this kind of things. The interpretor code is always helpfull, because it's some kind of reference. Also, some test (so some x86_64 program, using assembly or interpreter) will also be nice to have, especialy at the beggining.

ptitSeb / box64

box64 lib/libicudata.so.55 27754|0x1023b9a74: Unimplemented Opcode error #882