Open asiekierka opened 1 year ago
Unless it is attempting to avoid clobbering flags (something you worry about when rearranging instructions to optimize for a out-of-order superscalar processor, but not likely something a DOS compiler would worry about), the code generator should never emit a MOV
instruction to clear a register (i.e., set it to 0). It should instead always XOR the register with itself. Therefore, the final optimized code should be:
xor %ah, %ah
mov $0xba, %al
Using a MOV (or even an XCHG, especially if the accumulator is one of the operands, where the special 1-byte encoding of XCHG can be used, which can be a significant performance win on CPUs constrained by prefetching like the 8088 and 386SX) instead of a shift left or right by 8 is probably a missed optimization opportunity in multiple places besides this one.
x *= 256
is equal to:
x <<= 8
but should be generated as a MOV that swaps the low byte into the high byte and then clears the low byte. The inverse should occur for division by 256.
I'd assume that turning the mov 0x0,%ah
into a xor
would be covered by a separate optimization rule; on the specific 8086 variant I'm targetting, I think there is no actual performance difference, so it didn't occur to me to bring it up. Sorry!
In my tests, the following code (changed a little for a minimal test case):
compiled to the following code under
-O2 -mcmodel=medium
(as well as-Os -mcmodel=medium
and-O3 -mcmodel=medium
):I think that SHL/OR pair could be replaced with two MOVs, given that we can actually do so for AX/BX/CX/DX:
which hopefully the compiler could optimize further to: