open-watcom / open-watcom-v2

Open Watcom V2.0 - Source code repository, Wiki, Latest Binary build, Archived builds including all installers for download.
Other
964 stars 157 forks source link

owcc -O2 (-O3) with -mtune=i686 does not create cmovl or cmovg (wcc386 gets -onatxhl+ -6r) #1308

Open winspool opened 2 months ago

winspool commented 2 months ago

I tried code example from a youtube video and found out, that the OW does not detect the branchless code and always uses a branch in the other examples.

Original source was a branchless version and a version with if. I added an often used code-style with a conditional:

/* original example: branchless code */
int smaller_branchless(int a, int b)
{
    return a* (a<b)  +  b*(b<=a);
}

/* my extension: A very common usage with a conditional */
int smaller_cond(int a, int b)
{
    return (a < b) ? a : b;
}

/* original example: with if */
int smaller_if(int a, int b)
{
    if (a < b)
        return a;
    else
        return b;
}

clang (-O2/-O3) detects the branchless version and produces the same code for all 3 variants:

   0:   8b 44 24 08             mov    0x8(%esp),%eax
   4:   8b 4c 24 04             mov    0x4(%esp),%ecx
   8:   39 c1                   cmp    %eax,%ecx
   a:   0f 4c c1                cmovl  %ecx,%eax
   d:   c3                      ret

The branchless version of gcc (-O2/-O3) is a bit longer, but gcc creates the same code for the conditional and the "if" version.

  20:   8b 44 24 08             mov    0x8(%esp),%eax
  24:   8b 54 24 04             mov    0x4(%esp),%edx
  28:   39 d0                   cmp    %edx,%eax
  2a:   0f 4f c2                cmovg  %edx,%eax
  2d:   c3                      ret

OpenWatcom has a small advantage here, because of the selected register calling convention (-6r) but does not detect, what the branchless code is doing and for the conditional source example and the "if" source example, the code produced by OW (owcc -mtune=i686 with -O2 or with -O3) has always a branch:

0020  39 D0             cmp     eax,edx
0022  7D 01             jge     L$1
0024  C3                ret
0025                L$1:
0025  89 D0             mov     eax,edx
0027  C3                ret

I expect, that the impact of running the branching code produced by OW is smaller on recent CPUs (speculative execution, register renaming, branch prediction, ...) compared to the Pentium Pro processor generation (when the "cmov" family of commands appeared) but when a branch can be avoided easily, it should be done (see clang and gcc).

winspool commented 2 months ago

I used Compiler explorer and verified, that icc (2021.10) and msvc (17.10) create similar branchless code for the conditional example and the "if" example.

https://msvc.godbolt.org/z/YYdv41x97

jmalak commented 2 months ago

˝Thanks for your info.