open-watcom / open-watcom-v2

Open Watcom V2.0 - Source code repository, Wiki, Latest Binary build, Archived builds including all installers for download.
Other
992 stars 163 forks source link

cg: OW does not detect a code sequence for a byte swap (endian change) #1321

Open winspool opened 3 months ago

winspool commented 3 months ago

Due to the current work on Elf64 support, which might need to swap endian, i tested a code example to swap 64bit endian. (I extended the example with a 32bit and a 16bit version)

Additionally, i added endian change code (htons/htonl) and the SWAPNC_16/SWAPNC_32/SWAPNC_64 macros from the OW source. bswap_demo.c.txt

Such a code sequence is not detected by OW. The resulting code generated by clang -m32 is much nicer (-O2, -O3, -Os). (a rol for 16bit, one bswap for 32bit and two bswap for 64bit) (bswap is available since 486)

bswap_demo_clang32-Os.o:     Dateiformat elf32-i386
Disassembly of section .text:
00000000 <my_htons>:
   0:   0f b7 44 24 04          movzwl 0x4(%esp),%eax
   5:   66 c1 c0 08             rol    $0x8,%ax
   9:   c3                      ret

0000000a <my_htonl>:
   a:   8b 44 24 04             mov    0x4(%esp),%eax
   e:   0f c8                   bswap  %eax
  10:   c3                      ret

00000011 <use_SWAPNC_16>:
  11:   0f b7 44 24 04          movzwl 0x4(%esp),%eax
  16:   66 c1 c0 08             rol    $0x8,%ax
  1a:   c3                      ret

0000001b <use_SWAPNC_32>:
  1b:   8b 44 24 04             mov    0x4(%esp),%eax
  1f:   0f c8                   bswap  %eax
  21:   c3                      ret

00000022 <use_SWAPNC_64>:
  22:   8b 54 24 04             mov    0x4(%esp),%edx
  26:   8b 44 24 08             mov    0x8(%esp),%eax
  2a:   0f c8                   bswap  %eax
  2c:   0f ca                   bswap  %edx
  2e:   c3                      ret

The code generated by OW does not use rol or bswap (wcc386 gets -6r)

$ wdis bswap_demo_owcc-Os.o 
0000                my_htons_:
0000  52                push        edx
0001  0F B7 D0              movzx       edx,ax
0004  30 E4             xor     ah,ah
0006  C1 FA 08              sar     edx,0x08
0009  0F B7 C0              movzx       eax,ax
000C  81 E2 FF 00 00 00         and     edx,0x000000ff
0012  C1 E0 08              shl     eax,0x08
0015                L$1:
0015  09 C2             or      edx,eax
0017  89 D0             mov     eax,edx
0019  5A                pop     edx
001A  C3                ret
Routine Size: 27 bytes,    Routine Base: _TEXT + 0000

001B                my_htonl_:
001B  51                push        ecx
001C  52                push        edx
001D  89 C2             mov     edx,eax
001F  89 C1             mov     ecx,eax
0021  C1 EA 10              shr     edx,0x10
0024  C1 E9 18              shr     ecx,0x18
0027  81 E2 FF 00 00 00         and     edx,0x000000ff
002D  81 E1 FF 00 00 00         and     ecx,0x000000ff
0033  C1 E2 08              shl     edx,0x08
0036  09 D1             or      ecx,edx
0038  89 C2             mov     edx,eax
003A  C1 EA 08              shr     edx,0x08
003D  81 E2 FF 00 00 00         and     edx,0x000000ff
0043  25 FF 00 00 00            and     eax,0x000000ff
0048  C1 E2 10              shl     edx,0x10
004B  C1 E0 18              shl     eax,0x18
004E                L$2:
004E  09 CA             or      edx,ecx
0050  09 D0             or      eax,edx
0052  5A                pop     edx
0053  59                pop     ecx
0054  C3                ret
Routine Size: 58 bytes,    Routine Base: _TEXT + 001B

0055                use_SWAPNC_16_:
0055  52                push        edx
0056  89 C2             mov     edx,eax
0058  30 E6             xor     dh,ah
005A  30 C0             xor     al,al
005C  0F B7 D2              movzx       edx,dx
005F  0F B7 C0              movzx       eax,ax
0062  C1 E2 08              shl     edx,0x08
0065  C1 E8 08              shr     eax,0x08
0068  EB AB             jmp     L$1
Routine Size: 21 bytes,    Routine Base: _TEXT + 0055

006A                use_SWAPNC_32_:
006A  51                push        ecx
006B  52                push        edx
006C  89 C1             mov     ecx,eax
006E  89 C2             mov     edx,eax
0070  81 E1 FF 00 00 00         and     ecx,0x000000ff
0076  81 E2 00 FF 00 00         and     edx,0x0000ff00
007C  C1 E1 18              shl     ecx,0x18
007F  C1 E2 08              shl     edx,0x08
0082  09 CA             or      edx,ecx
0084  89 C1             mov     ecx,eax
0086  81 E1 00 00 FF 00         and     ecx,0x00ff0000
008C  25 00 00 00 FF            and     eax,0xff000000
0091  C1 E9 08              shr     ecx,0x08
0094  C1 E8 18              shr     eax,0x18
0097  EB B5             jmp     L$2
Routine Size: 47 bytes,    Routine Base: _TEXT + 006A

0099                use_SWAPNC_64_:
...
01AA  C3                ret
Routine Size: 274 bytes,    Routine Base: _TEXT + 0099

(Other functions skipped)

OpenWatcom looks really bad in the examples using the SWAPNC_* macros from the OW source (21 / 47 / 274 byte with 7 subfunction calls, args in registers) compared to clang (10 / 7 / 13 byte, args on stack)

Unfortunately, i have no idea, how the target code is selected in the OW code generator.

There might be different examples, which OW can handle better, but i don't know that.

Usage count of such a code sequence is likely low, but the OW code size is really huge (args in register) compared to the clang generated code (args on stack).

winspool commented 3 months ago

Might be adding intrinsics for byte swap (16/32/64 bits) a simple way to handle such cases?

jmalak commented 3 months ago

I don't understand what you are reporting.