Open wheremyfoodat opened 1 year ago
You also don't need to preserve rax/rcx/rdx most of the time. The calling convention says they're volatile on both MS ABI (Windows) and Sys-V (Linux, Mac, BSD, and everything not Windows).
One of my introduction-to-x86 PDFs in the emudev devnull channel explains this stuff. Of course you should respect the ABI of whatever system you're running on, and the ABIs do have some stack-related requirements that may break if you remove the pushes n pops
Thank you very much for the very detailed analysis !! :heart_eyes: This will help me enormously to improve the code :smile:
Hello. I saw your project on Github and Discord and would like to provide some further feedback to improve the code
Performance improvements
xor r64, r64
where r64 = a 64-bit register is an anti-pattern. 64-bit operations are inherently more bloated because to encode them, you need to insert a REX.W prefix. Thankfully, when writing to a 32-bit register, the top 32 bits of the whole 64-bit reg are zeroed out (The same is NOT true for 8 and 16-bit register wirtes). Soxor eax, eax
for example is a more efficient version ofxor rax, rax
,mov eax, 0
and so on (Though sometimes mov is preferred over xor when the preservation of flags is important).CLS
to use a memset written in SSE or AVX (when available), which you could use to write 16 or 32 bytes at a time, and maybe even partially unroll the clear loop for maximum efficiency. If you'd like an example, I have some in my MIPS->x64 recompiler for PCSX-Redux: https://github.com/grumpycoders/pcsx-redux/blob/1e18f27c13a19e248947afe549ac63f984df1093/src/core/DynaRec_x64/recompiler.cc#L254. If you want help learning about SIMD in general, I have various code snippets to help you, though most are decently complicated so it would be best to discuss over text. On a side note, one of the reasons I really recommend Xbyak is that it really helps with things like this, as it has tools to help me detect when eg AVX is available, without me having to write any code for itmov byte ptr [&context.V[ins.x]], ins.kk
since kk is a compile time constant. Same foradd
and other similar instructions.setc
since you're setting vf to the carry out of the shift. The 2 opcodes are equivalent howeverThat's it for now. Have a nice day.