jacobwilliams / roots-fortran

A modern Fortran library for finding the roots of continuous scalar functions of a single real variable, using derivative-free methods.
Other
35 stars 5 forks source link

Robust bisect #15

Open ivan-pi opened 2 years ago

ivan-pi commented 2 years ago

https://github.com/jacobwilliams/roots-fortran/blob/60524bc4f491a50ae1b2e538e1e486551812b9db/src/root_module.F90#L2524

The excerpt below is taken from Kahaner, Moler, & Nash, Numerical Methods and Software, 1989, Prentice-Hall, Inc.:

image

According to Herbie this rearrangement seems irrelevant in FP64, however I couldn't find an option to see what happens in lower precision: https://herbie.uwplse.org/demo/ebc6120a0224e43ec4e96ae4c79d8bf1537cfdf8.1.6/graph.html#history

ivan-pi commented 2 years ago

The compilers do interpret the instructions differently. Question is whether the error of ADD+MUL is lower than the error of SUB + FMA, and whether it even matters for > FP32?

(a + b)/2

gfortran -O2 -march=native

__root_MOD_bisect:
        vmovsd  xmm0, QWORD PTR [rdi]
        vaddsd  xmm0, xmm0, QWORD PTR [rsi]
        vmulsd  xmm0, xmm0, QWORD PTR .LC1[rip]
        ret
.LC1:
        .long   0
        .long   1071644672

ifx -O2 -xHost

.LCPI1_0:
        .quad   0x3fe0000000000000
root_mp_bisect_:
        vmovsd  xmm0, qword ptr [rdi]
        vaddsd  xmm0, xmm0, qword ptr [rsi]
        vmulsd  xmm0, xmm0, qword ptr [rip + .LCPI1_0]
        ret

a + 0.5*(b - a)

gfortran -O2 -march=native

__root_MOD_bisect:
        vmovsd  xmm1, QWORD PTR [rdi]
        vmovsd  xmm0, QWORD PTR [rsi]
        vsubsd  xmm0, xmm0, xmm1
        vfmadd132sd     xmm0, xmm1, QWORD PTR .LC1[rip]
        ret
.LC1:
        .long   0
        .long   1071644672

ifx -O2 -xHost

.LCPI1_0:
        .quad   0x3fe0000000000000
root_mp_bisect_:
        vmovsd  xmm1, qword ptr [rdi]
        vmovsd  xmm0, qword ptr [rsi]
        vsubsd  xmm0, xmm0, xmm1
        vfmadd132sd     xmm0, xmm1, qword ptr [rip + .LCPI1_0]
        ret
jacobwilliams commented 2 years ago

We can compile the lib and tests using the -D REAL32 option and see if it makes a difference?

ivan-pi commented 2 years ago

In the Boost C++ root finding library (/boost/math/tools/roots.hpp), they use

         delta = 0.5F * (guess - max);  // lucky that 0.5F has an exact binary representation...
         result = guess - delta;

in the fallback bisection step of Newton's method.

However in the actual bisection routine they do it the "naive" way:

   while(count && (0 == tol(min, max)))
   {
      T mid = (min + max) / 2;   // 2 gets correctly cast to the required precision
      T fmid = f(mid);
      ...

So it looks like they are indifferent to this issue.

In SciPy's zero methods (only double precision, e.g. https://github.com/scipy/scipy/blob/v1.9.0/scipy/optimize/Zeros/bisect.c) they use the "correction" approach consistently in all of the solvers:

    dm = xb - xa;
    solver_stats->iterations = 0;
    for (i=0; i<iter; i++) {
        solver_stats->iterations++;
        dm *= .5;         // only double supported anyways, else should be (T) 0.5;
        xm = xa + dm;
ivan-pi commented 2 years ago

The following paper is dedicated entirely to bisecting an interval:

Goualard, F. (2014). How do you compute the midpoint of an interval?. ACM Transactions on Mathematical Software (TOMS), 40(2), 1-25. https://doi.org/10.1145/2493882

A preprint is available here.