Due to the current work on Elf64 support, which might need to swap endian,
i tested a code example to swap 64bit endian.
(I extended the example with a 32bit and a 16bit version)
Additionally, i added endian change code (htons/htonl)
and the SWAPNC_16/SWAPNC_32/SWAPNC_64 macros from the OW source.
bswap_demo.c.txt
Such a code sequence is not detected by OW.
The resulting code generated by clang -m32 is much nicer (-O2, -O3, -Os).
(a rol for 16bit, one bswap for 32bit and two bswap for 64bit)
(bswap is available since 486)
OpenWatcom looks really bad in the examples using the SWAPNC_* macros from the OW source
(21 / 47 / 274 byte with 7 subfunction calls, args in registers)
compared to clang (10 / 7 / 13 byte, args on stack)
Unfortunately, i have no idea, how the target code is selected in the OW code generator.
There might be different examples, which OW can handle better, but i don't know that.
Usage count of such a code sequence is likely low,
but the OW code size is really huge (args in register)
compared to the clang generated code (args on stack).
Due to the current work on Elf64 support, which might need to swap endian, i tested a code example to swap 64bit endian. (I extended the example with a 32bit and a 16bit version)
Additionally, i added endian change code (htons/htonl) and the SWAPNC_16/SWAPNC_32/SWAPNC_64 macros from the OW source. bswap_demo.c.txt
Such a code sequence is not detected by OW. The resulting code generated by
clang -m32
is much nicer (-O2, -O3, -Os). (arol
for 16bit, onebswap
for 32bit and twobswap
for 64bit) (bswap
is available since 486)The code generated by OW does not use
rol
orbswap
(wcc386 gets-6r
)OpenWatcom looks really bad in the examples using the
SWAPNC_*
macros from the OW source (21 / 47 / 274 byte with 7 subfunction calls, args in registers) compared to clang (10 / 7 / 13 byte, args on stack)Unfortunately, i have no idea, how the target code is selected in the OW code generator.
There might be different examples, which OW can handle better, but i don't know that.
Usage count of such a code sequence is likely low, but the OW code size is really huge (args in register) compared to the clang generated code (args on stack).