Add support for IEEE underflow control

topherocity commented 2 years ago

Hello all! First of all, thanks so much for the work you've done to provide first-class support for what seems to be the direction Apple will be heading with their hardware for the foreseeable future.

As a scientific Fortran user, one of the intrinsics that we've taken advantage of in some of our numerical codes has been the ieee_set_underflow_mode procedure, which allows our codes to flush denormalized numbers to zero.

This procedure is not currently implemented for AArch64:

https://github.com/iains/gcc-darwin-arm64/blob/fb3177f9479b0b25a3699c43bf378dd17846d733/libgfortran/config/fpu-aarch64.h#L300-L307

This is something that is certainly possible, as illustrated by the following code snippet (please forgive my poor C style):

#include <stdio.h>
#include <float.h>

#define FE_MAP_FZ 0x1000000

void set_fpu_underflow_mode (int gradual __attribute__((unused)))
{
  unsigned int fpcr;
  fpcr = __builtin_aarch64_get_fpcr();
  if (gradual)
    fpcr &= ~FE_MAP_FZ;
  else
    fpcr |= FE_MAP_FZ;

   __builtin_aarch64_set_fpcr(fpcr);
}

int main() {
   unsigned int fpcr;

   set_fpu_underflow_mode(0);

   float test_number = 1e-37;
   while(test_number > 0) {
      test_number = test_number * 0.5;
      printf("%e", test_number);
      if (test_number < FLT_MIN) printf(" - denormalized");
      printf("\n");
   }
   return 0;

which returns

5.000000e-38
2.500000e-38
1.250000e-38
0.000000e+00 - denormalized

I would be happy to create a pull request to provide support for underflow control - is there a reason I'm missing why this is currently unsupported? Would a pull request be welcome?

iains commented 2 years ago

I'll leave @fxcoudert to comment on this :)

fxcoudert commented 2 years ago

Thanks @topherocity. Some comments:

is there a reason I'm missing why this is currently unsupported?

Two reasons, but not good ones. First, I'm not an aarch64 expert at all, and I wrote the current IEEE code by looking up the logic in glibc. And glibc does not have subnormal / underflow control function calls (except on alpha). Second reason is I think underflow control is generally not widely used in Fortran codes… in fact, in gfortran it is only implemented on two processors: i386/x86_64 (through assembly) and alpha (through glibc).

Would a pull request be welcome?

Yes, definitely. I would welcome @jeffhammond's ideas/advice on this.

Tentative patch, I would welcome ideas of Fortran codes that test it.

diff --git a/libgfortran/config/fpu-aarch64.h b/libgfortran/config/fpu-aarch64.h
index 4db1b6c4f6b..b909fa9f5dd 100644
--- a/libgfortran/config/fpu-aarch64.h
+++ b/libgfortran/config/fpu-aarch64.h
@@ -31,6 +31,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define FE_UPWARD     0x400000
 #define FE_DOWNWARD   0x800000
 #define FE_TOWARDZERO 0xc00000
+#define FE_MAP_FZ     0x1000000

 /* Exceptions */

@@ -301,22 +302,30 @@ support_fpu_rounding_mode (int mode __attribute__((unused)))
 int
 support_fpu_underflow_control (int kind __attribute__((unused)))
 {
-  /* Unsupported */
-  return 0;
+  /* Not supported for binary128.  */
+  return (kind == 4 || kind == 8) ? 1 : 0;
 }

 int
 get_fpu_underflow_mode (void)
 {
-  /* Unsupported */
-  return 0;
+  unsigned int fpcr = __builtin_aarch64_get_fpcr();
+
+  /* Return 0 for abrupt underflow (flush to zero), 1 for gradual underflow.  */
+  return (fpcr & FE_MAP_FZ) ? 0 : 1;
 }

 void
-set_fpu_underflow_mode (int gradual __attribute__((unused)))
+set_fpu_underflow_mode (int gradual)
 {
-  /* Unsupported */
+  unsigned int fpcr = __builtin_aarch64_get_fpcr();
+
+  if (gradual)
+    fpcr &= ~FE_MAP_FZ;
+  else
+    fpcr |= FE_MAP_FZ;
+
+  __builtin_aarch64_set_fpcr(fpcr);
 }

Edit: modified the patch, because underflow control is not supported for the binary128 floating-point type.

fxcoudert commented 2 years ago

The above (modified) patch regtests fine on aarch64-apple-darwin21. The (few) existing tests about underflow control in the gfortran testsuite do not fail.

iains commented 2 years ago

shall I stick it into the next rebase?

iains commented 2 years ago

one small comment - I notice that;

#define FPCR_RM_MASK 0x0c00000 does not include the new bit allocated to the Flush-to-zero,

(FWIW in terms of utility) ; I recall that this option was present on the very first IEEE754 FP chips i used (from Weitek) .. there are various cases, certainly in audio DSP (with soft-real-time guarantees) where the time penalty of the denormalised number process on some platforms made it completely unusable.

fxcoudert commented 2 years ago

define FPCR_RM_MASK 0x0c00000

does not include the new bit allocated to the Flush-to-zero

That's actually what we want, I think, because the underflow mode is not a rounding mode, it's an entirely separate setting.

iains commented 2 years ago

I think if we can cook the test case into something for the testsuite, then this could be closed as 'done'.

fxcoudert commented 2 years ago

We actually have some tests already, that I had written some time ago:

./ieee/ieee_5.f90
./ieee/large_4.f90
./ieee/large_4.f90
./ieee/large_1.f90
./ieee/underflow_1.f90
./ieee/ieee_8.f90

Unless a bug is reported, or more ideas for tests given, I think this can be closed.

iains commented 2 years ago

fixed

iains / gcc-darwin-arm64

Add support for IEEE underflow control #62

define FPCR_RM_MASK 0x0c00000