Closed topherocity closed 2 years ago
I'll leave @fxcoudert to comment on this :)
Thanks @topherocity. Some comments:
is there a reason I'm missing why this is currently unsupported?
Two reasons, but not good ones. First, I'm not an aarch64 expert at all, and I wrote the current IEEE code by looking up the logic in glibc. And glibc does not have subnormal / underflow control function calls (except on alpha). Second reason is I think underflow control is generally not widely used in Fortran codes… in fact, in gfortran it is only implemented on two processors: i386/x86_64 (through assembly) and alpha (through glibc).
Would a pull request be welcome?
Yes, definitely. I would welcome @jeffhammond's ideas/advice on this.
Tentative patch, I would welcome ideas of Fortran codes that test it.
diff --git a/libgfortran/config/fpu-aarch64.h b/libgfortran/config/fpu-aarch64.h
index 4db1b6c4f6b..b909fa9f5dd 100644
--- a/libgfortran/config/fpu-aarch64.h
+++ b/libgfortran/config/fpu-aarch64.h
@@ -31,6 +31,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
#define FE_UPWARD 0x400000
#define FE_DOWNWARD 0x800000
#define FE_TOWARDZERO 0xc00000
+#define FE_MAP_FZ 0x1000000
/* Exceptions */
@@ -301,22 +302,30 @@ support_fpu_rounding_mode (int mode __attribute__((unused)))
int
support_fpu_underflow_control (int kind __attribute__((unused)))
{
- /* Unsupported */
- return 0;
+ /* Not supported for binary128. */
+ return (kind == 4 || kind == 8) ? 1 : 0;
}
int
get_fpu_underflow_mode (void)
{
- /* Unsupported */
- return 0;
+ unsigned int fpcr = __builtin_aarch64_get_fpcr();
+
+ /* Return 0 for abrupt underflow (flush to zero), 1 for gradual underflow. */
+ return (fpcr & FE_MAP_FZ) ? 0 : 1;
}
void
-set_fpu_underflow_mode (int gradual __attribute__((unused)))
+set_fpu_underflow_mode (int gradual)
{
- /* Unsupported */
+ unsigned int fpcr = __builtin_aarch64_get_fpcr();
+
+ if (gradual)
+ fpcr &= ~FE_MAP_FZ;
+ else
+ fpcr |= FE_MAP_FZ;
+
+ __builtin_aarch64_set_fpcr(fpcr);
}
Edit: modified the patch, because underflow control is not supported for the binary128 floating-point type.
The above (modified) patch regtests fine on aarch64-apple-darwin21. The (few) existing tests about underflow control in the gfortran testsuite do not fail.
shall I stick it into the next rebase?
one small comment - I notice that;
#define FPCR_RM_MASK 0x0c00000
does not include the new bit allocated to the Flush-to-zero,
(FWIW in terms of utility) ; I recall that this option was present on the very first IEEE754 FP chips i used (from Weitek) .. there are various cases, certainly in audio DSP (with soft-real-time guarantees) where the time penalty of the denormalised number process on some platforms made it completely unusable.
define FPCR_RM_MASK 0x0c00000
does not include the new bit allocated to the Flush-to-zero
That's actually what we want, I think, because the underflow mode is not a rounding mode, it's an entirely separate setting.
I think if we can cook the test case into something for the testsuite, then this could be closed as 'done'.
We actually have some tests already, that I had written some time ago:
./ieee/ieee_5.f90
./ieee/large_4.f90
./ieee/large_4.f90
./ieee/large_1.f90
./ieee/underflow_1.f90
./ieee/ieee_8.f90
Unless a bug is reported, or more ideas for tests given, I think this can be closed.
fixed
Hello all! First of all, thanks so much for the work you've done to provide first-class support for what seems to be the direction Apple will be heading with their hardware for the foreseeable future.
As a scientific Fortran user, one of the intrinsics that we've taken advantage of in some of our numerical codes has been the
ieee_set_underflow_mode
procedure, which allows our codes to flush denormalized numbers to zero.This procedure is not currently implemented for AArch64:
https://github.com/iains/gcc-darwin-arm64/blob/fb3177f9479b0b25a3699c43bf378dd17846d733/libgfortran/config/fpu-aarch64.h#L300-L307
This is something that is certainly possible, as illustrated by the following code snippet (please forgive my poor C style):
which returns
I would be happy to create a pull request to provide support for underflow control - is there a reason I'm missing why this is currently unsupported? Would a pull request be welcome?