Open ljmf00-wekaio opened 1 year ago
If I'm not missing something, this can be further optimized to on GCC:
test(bool, bool):
xor rdi, rsi
xor al, 1
ret
As xor
takes less uOps than sete
and bool is garanteed to be 0
or 1
.
I discovered an interesting result when compiling
and comparing the generated assembly with:
They are supposed to be semantically the same, according to x86_64 semantics and with register/word aligned memory based calling conventions, although, it seems that the compiler can't optimize the function prologue and I assume it is to not possibly mess with the architecture calling convention, but in some cases the compiler can still optimise the code/combine instructions without messing with it. See this comparison with GCC compiler on x86_64 codegen: https://godbolt.org/z/sxnvnv93P .