hercules-390 / hyperion

Hercules 390
Other
251 stars 70 forks source link

Assembly code in machdep.h is rarely used when Hyperion is compiled for 32-bit UNIX-like systems #252

Open srorso opened 6 years ago

srorso commented 6 years ago

The 32-bit assembly code for UNIX-like systems in machdep.h is not generated when -march=native is included as a gcc command line option when compiling for a 32-bit target system. Hercules normally includes -march=native. The fallback is to c code that does not appear to include locking.

Background

The 32-bit assembly code in machdep.h is generated for UNIX-like systems only when any of the following processor-specific preprocessor macros are defined.

   __i686__
   __pentiumpro__
   __pentium4__
   __athlon__
   __athlon

The code was added to machdep.h about ten years ago and has not been changed much.

When Hercules is compiled for a 32-bit target UNIX-like system (-m32 specified or defaulted) and with -march=native, none of the needed preprocessor macros are defined by gcc, or by clang for that matter. The option -march=native became a default for 32-bit and 64-bit builds of Hercules about five years ago. The CMake scripts continue this default.

Reading gcc documentation, -march=native sounds like an excellent default. And having read the gcc code that probes the processor for -march=native, gcc does a very thorough job of identifying x86 processor capabilities. The same gcc probe code is used for -mtune=native, but fewer of the results are used by gcc to generate machine code.

The coding in machdep.h for 64-bit UNIX-like targets and for all Windows targets is not affected by this issue.

Proposed Change

Use gcc/clang atomic intrinsics in machdep.h where such intrinsics are supported by the compiler in use. These were first documented in gcc 4.1.2 and should be included in clang, which is based on gcc 4.2.1. The intrinsics are documented here (gcc 4.2.1):

https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html#Atomic-Builtins

Advantages:

Disadvantages:

One Clang Oddity

Changes to CMake build scripts:

1) Test for the ability of the c compiler to use the atomic intrinsic for compare and swap. If the compiler accepts the atomic intrinsic, set a preprocessor macro to indicate this.

2) If the atomic intrinsic is rejected, run a compile that uses the current in-line assembly code appropriate to the bitness of the target system. If this is accepted, set a preprocessor macro to indicate this.

Changes to GNU-Autotools scripts (configure.ac)

1) Test the gcc compiler version for 4.1.2 or better. If true, then set a preprocessor macro to indicate the availability of atomic intrinsics.

Changes to machdep.h coding for UNIX-like targets

1) Test for the availability of atomic intrinsics. If available, use atomic intrinsics for both 32-bit and 64-bit targets. It may be possible to replace the current static inline functions with a #define for each type requiring locked access.

2) If atomic intrinsics are not available but the compiler accepted the in-line assembler code, use the assembler code.

3) Fall back to c code.

Additional information

To view the preprocessor macros defined by gcc or clang that are relevant to this discussion, use the following command line. If you don't wish to create foo.h, use /dev/null as the input.

touch foo.h
gcc -E  -fPIC -dM foo.h -m32 -march=native     \
             | grep -e '\(pentium\)\|\(PIC\)\|\(86[ _]\)\|\(amd\)'

Adjust -m32 and -march= as you see fit.

Sample program to view generated code

The following program can be used to verify that the assembly code generated by the atomic intrinsics is very similar to that included machdep.h.

#include <stdbool.h>

#define swap( x, y, z) __sync_bool_compare_and_swap( z, x, y )

int main() {
   long long old, new, val;
   bool rc;
   old = 2;
   new = 5;
   val = 2;

   rc = __sync_bool_compare_and_swap( &val, old, new );

   rc = swap( old, new, &val );

   return rc;
}