Q.to_float is not correct for denormal results

ocaml / Zarith

The Zarith library implements arithmetic and logical operations over arbitrary-precision integers and rational numbers. The implementation, based on GMP, is very efficient.

Other

231 stars 70 forks source link

Q.to_float is not correct for denormal results #29

Closed xavierleroy closed 4 years ago

xavierleroy commented 6 years ago

Consider the following example submitted by @mbarbin in https://github.com/ocaml/Zarith/commit/5463dbce327193fb5f21be179b0984b3ebb0350e#commitcomment-27324931:

(* q = 7.56181796669062E-309 *)
let q = Q.make (Z.of_string "756181796669062") (Z.pow (Z.of_int 10) (309+14));;

Printf.printf "%h\n" (Q.to_float q);;
(* 0x0.570020d1942p-1022 *)

Printf.printf "%h\n" 7.56181796669062E-309;;
(* 0x0.570020d1941ffp-1022 *)

The result of Q.to_float is not the FP number closest to the exact value.

@pascal-cuoq notes that the "round to odd" technique used in Q.to_float is incorrectly used in this particular case involving denormal FP numbers as the result.

xavierleroy commented 6 years ago

@pascal-cuoq you seem to understand the issue, so would you please propose a fix?

pascal-cuoq commented 6 years ago

I will try to offer something.

pascal-cuoq commented 6 years ago

I can do you a function with a slightly similar spec: it always produces the nearest float (modulo any remaining off-by-one error in the current draft):

diff --git a/q.ml b/q.ml
index e7dfd81..2f736e4 100644
--- a/q.ml
+++ b/q.ml
@@ -198,32 +198,51 @@ let to_float x =
   | NZERO ->
     let p = x.num and q = x.den in
     let np = Z.numbits p and nq = Z.numbits q in
-    if np <= 53 && nq <= 53 then
-      (* p and q convert to floats exactly; use FP division to get the
-         correctly-rounded result. *)
-      Int64.to_float (Z.to_int64 p) /. Int64.to_float (Z.to_int64 q)
-    else begin
-      (* |p| is in [2^(np-1), 2^np)
-         q is in [2^(nq-1), 2^nq)
-         hence |p/q| is in (2^(np-nq-1), 2^(np-nq+1)).
-         We define n such that |p/q*2^n| is in [2^54, 2^56).
-         >= 2^54 so that the round to odd technique applies.
-         < 2^56 so that the integral part is representable as an int64. *)
-      let n = 55 - (np - nq) in
-      (* Scaling p/q by 2^n *)
-      let (p', q') =
-        if n >= 0
-        then (Z.shift_left p n, q)
-        else (p, Z.shift_left q (-n)) in
-      (* Euclidean division of p' by q' *)
-      let (quo, rem) = Z.ediv_rem p' q' in
-      (* quo is the integral part of p/q*2^n
-         rem/q' is the fractional part. *)
-      (* Round quo to float *)
-      let f = Z.round_to_float quo (Z.sign rem = 0) in
-      (* Apply exponent *)
-      ldexp f (-n)
-    end
+    let negat, p =
+      if Z.lt p Z.zero then true, Z.neg p else false, p
+    in
+    (* p is in [2^(np-1), 2^np)
+       q is in [2^(nq-1), 2^nq)
+       We define n,p',q' such that p'/q'=p/q and |p/q*2^n| is in [1, 2). *)
+    let n = np - nq in
+    (* Scaling p/q by 2^n *)
+    let (p', q') =
+      if n >= 0
+      then p, Z.shift_left q n
+      else Z.shift_left p (-n), q
+    in
+    let (p', n) =
+      if Z.geq p' q'
+      then (p', n)
+      else (Z.shift_left p' 1, pred n)
+    in
+    let digits, n =
+      if n >= -1022
+      then 52, n
+      else if n <= -1022 - 52
+      then 0, n - 52
+      else 52 + (n + 1022), -1022
+    in
+    let p' = Z.shift_left p' digits in
+    (* Euclidean division of p' by q' *)
+    let (quo, rem) = Z.ediv_rem p' q' in
+    let rem2 = Z.shift_left rem 1 in
+    let quo' =
+      if Z.lt rem2 q'
+      then (* round down *)
+        quo
+      else if Z.gt rem2 q'
+      then (* round up *)
+        Z.succ quo
+      else (* round to even *)
+        let last_bit = Z.(land) quo Z.one in
+        Z.add quo last_bit
+    in
+    let f = Z.to_float quo' in
+    (* Apply exponent *)
+    let r = ldexp f (n - 52) in
+    if negat then -. r else r
+

 (* operations *)
 (* ---------- *)

I do not see any way to preserve the current “using the current floating-point rounding mode” aspect of the spec. It was cute the way it worked for normal results with the previous implementation, but keeping that aspect either means using a not-too-portable C function to detect the current rounding mode, or making a convoluted adjustment in floating-point arithmetic at the end.

pascal-cuoq commented 6 years ago

I found a cute way: I'll add a power of two in the range 2^53-2^63 before the conversion of the quotient to float, and I'll subtract it after the conversion, to ensure that the quotient is rounded directly to the correct number of bits.

antoinemine commented 6 years ago

@pascal-cuoq thanks for proposing some fixes. Is there a risk of double-rounding?

pascal-cuoq commented 6 years ago

With the cute solution there will be only one rounding, in the conversion from Z to float. All the other operations (adding the power of two, subtracting it, and applying the exponent) will be exact.

pascal-cuoq commented 6 years ago

The cute solution currently looks as follows, but isn't tested on any other input than that provided by @mbarbin. While I am at it, I have some questions:

why Int64.to_float (Z.to_int64 q) in the original code? Is that supposed to be more efficient than Z.to_float when the conversion is known to be exact? I will take advantage of this construct for offset if it is.
why Z.sign rem = 0 in the original code? Is that supposed to be more efficient than Z.equal rem Z.zero?

let to_float x =
  match classify x with
  | ZERO -> 0.0
  | INF  -> infinity
  | MINF -> neg_infinity
  | UNDEF -> nan
  | NZERO ->
    let p = x.num and q = x.den in
    let np = Z.numbits p and nq = Z.numbits q in
    if np <= 53 && nq <= 53 then
      (* p and q convert to floats exactly; use FP division to get the
         correctly-rounded result. *)
      Int64.to_float (Z.to_int64 p) /. Int64.to_float (Z.to_int64 q)
    else begin
      let negat =
        if Z.lt p Z.zero then Z.neg Z.one else Z.one
      in
      (* p is in [2^(np-1), 2^np)
         q is in [2^(nq-1), 2^nq)
         We define n,p',q' such that p'/q'=p/q and |p'/q'*2^n| is in [1, 2). *)
      let n = np - nq in
      (* Scaling p/q by 2^n *)
      let (p', q') =
        if n >= 0
        then p, Z.shift_left q n
        else Z.shift_left p (-n), q
      in
      let (p', n) =
        if Z.geq (Z.abs p') q'
        then (p', n)
        else (Z.shift_left p' 1, pred n)
      in

      let p' = Z.shift_left p' 55 in
      (* Euclidean division of p' by q' *)
      let (quo, rem) = Z.ediv_rem p' q' in
      let offset =
        if n = -1023
        then
          (* quo has the form:      1xxxx...
             we add:                10000...
             so as to end up with: 10xxxx... *)
          Z.shift_left negat 55
        else if n <= -1024
        then
          (* quo has the form:       1xxxx...
             we add:               1000000...
             so as to end up with: 101xxxx... *)
          Z.shift_left negat (1024 + 57 + n)
        else
          Z.zero
      in
      let quo = Z.add offset quo in
      let f = Z.round_to_float quo (Z.sign rem = 0) in
      let f = f -. (Z.to_float offset)
      in
      ldexp f (n - 55)
    end

mbarbin commented 6 years ago

Thank you Pascal for looking into this. I ran this new version through the same test that yielded my original report. Test passes for the non denormal part of the test. It also failed the denormal section of the test, with a different value exhibited:

(q 1.54011863885598E-309)
(error (
 (mismatch
  (expected 1.54011863885598E-309)
  (returned 1.5401186388559775E-309)))

expected=0x0.11b82c776dd8fp-1022
returned=0x0.11b82c776dd8ep-1022

I will try to publish the test I have so you can take advantage of it too locally to toy with it. It depends on the quickcheck generator Core.Float.gen. Thanks.

mbarbin commented 6 years ago

trefis published the most recent bignum's test suite that includes those new tests: https://github.com/janestreet/bignum/commit/290967e2716896ac5487f666830a937efdb1159f

If running expect_test does not work well with your local build environment I think one can rewrite just the section 75-86 in question as a 1 file binary:

open! Core

let pick_close_float_that_rounds_trip f =
  let str = sprintf "%.15g" f in
  if Float.equal (Float.of_string str) f
  then Bignum.of_string str
  else Bignum.of_float_dyadic f
;;

let () =
  Quickcheck.test ~sexp_of:[%sexp_of:float] Float.gen ~f:(fun x ->
    let x' =
      x
      |> pick_close_float_that_rounds_trip
      |> Bignum.to_float
    in
    if not (Float.(=) x x' || (Float.is_nan x && Float.is_nan x'))
    then
      raise_s [%message "mismatch" (x : float) (x' : float)])
;;

xavierleroy commented 6 years ago

Thanks a lot @pascal-cuoq for the idea. I'm still digesting it but it looks rather cute indeed!

pascal-cuoq commented 6 years ago

Sorry not to have more time for this tonight, here are the problems I have found while looking again at the attempt above:

I meant for the following correspondance between n and offset (assuming a positive p):

n=-1023   offset=1<<55
n=-1024   offset=1<<57
n=-1025   offset=1<<58
n=-1026   offset=1<<59
...

This means that the line Z.shift_left negat (1024 + 57 + n) should have been Z.shift_left negat (57 + (-1024 - n))

I had not read the implementation of Z.round_to_float carefully enough. It wants a Z.t that is representable as an Int64.t. Small enough denormals can cause larger integers than this, so better not use Z.round_to_float at all.
A problem remains if offset overflows the maximum power of two representable as a float, for ratios smaller than 2^-2000 or so.

Taking into account the first two issues, my current version follows. I was not able to test it yet, this is the first time I use Core and I stopped at:

~/Zarith $ ocamlopt -I /Users/pascal/tis-kernel-opam//4.05.0/lib/bignum -I /Users/pascal/tis-kernel-opam//4.05.0/lib/base -I /Users/pascal/tis-kernel-opam//4.05.0/lib/core_kernel/ -I /Users/pascal/tis-kernel-opam//4.05.0/lib/core t.ml /Users/pascal/tis-kernel-opam//4.05.0/lib/core/core.cmxa zarith.cmxa 
File "t.ml", line 6, characters 7-23:
Error: Unbound value Bignum.of_string

let to_float x =
  match classify x with
  | ZERO -> 0.0
  | INF  -> infinity
  | MINF -> neg_infinity
  | UNDEF -> nan
  | NZERO ->
    let p = x.num and q = x.den in
    let np = Z.numbits p and nq = Z.numbits q in
    if np <= 53 && nq <= 53 then
      (* p and q convert to floats exactly; use FP division to get the
         correctly-rounded result. *)
      Int64.to_float (Z.to_int64 p) /. Int64.to_float (Z.to_int64 q)
    else begin
      let negat =
        if Z.lt p Z.zero then Z.neg Z.one else Z.one
      in
      (* p is in [2^(np-1), 2^np)
         q is in [2^(nq-1), 2^nq)
         We define n,p',q' such that p'/q'*2^n=p/q and |p'/q'| is in [1, 2). *)
      let n = np - nq in
      (* Scaling p/q by 2^n *)
      let (p', q') =
        if n >= 0
        then p, Z.shift_left q n
        else Z.shift_left p (-n), q
      in
      let (p', n) =
        if Z.geq (Z.abs p') q'
        then (p', n)
        else (Z.shift_left p' 1, pred n)
      in

      let p' = Z.shift_left p' 55 in
      (* Euclidean division of p' by q' *)
      let (quo, rem) = Z.ediv_rem p' q' in
      let offset =
        if n = -1023
        then
          (* quo has the form:      1xxxx...
             we add:                10000...
             so as to end up with: 10xxxx... *)
          Z.shift_left negat 55
        else if n <= -1024
        then
          (* quo has the form:       1xxxx...
             we add:               1000000...
             so as to end up with: 101xxxx... *)
          Z.shift_left negat (57 + (-1024 - n))
        else
          Z.zero
      in
      let quo = Z.add offset quo in
      let quo =
        if Z.sign rem = 0
        then quo
        else Z.(lor) Z.one quo (* round to odd *)
      in
      let f = Z.to_float quo in
      let f = f -. (Z.to_float offset)
      in
      ldexp f (n - 55)
    end

mbarbin commented 6 years ago

@pascal-cuoq in former version of Core, Bignum needed to be brought to scope via a open Bignum.Std. Also, I realize you might not want to depend on ppx, nor want to recompile bignum to be able to test your function. Please try this alternative:

open! Core
open! Bignum.Std

let bignum_to_q x =
  Zarith.Q.make
    (Bignum.num_as_bigint x |> Bigint.to_zarith_bigint)
    (Bignum.den_as_bigint x |> Bigint.to_zarith_bigint)
;;

let pick_value_to_be_tested f =
  let str = sprintf "%.15g" f in
  if Float.equal (Float.of_string str) f
  then bignum_to_q (Bignum.of_string str)
  else Zarith.Q.of_float f
;;

let q_to_float q =
  (* Replace by the function to test. *)
  Zarith.Q.to_float q
;;

let () =
  Quickcheck.test ~sexp_of:Float.sexp_of_t Float.gen_finite ~f:(fun x ->
    let x' =
      x
      |> pick_value_to_be_tested
      |> q_to_float
    in
    if not (Float.(=) x x')
    then (
      Printf.eprintf "mismatch\nq=%.15g\nexpected=%h\nreturned=%h\n"
        x x x';
      exit 1))
;;

pascal-cuoq commented 6 years ago

I gave up on Core again and tested my function with:

let test f =
  let q = Q.of_float f in
  if Q.to_float q <> f
  then Printf.printf "%s %h\n" (Q.to_string q) f

let () =
  for i = 1 to 1000000000 do
    test (Int64.float_of_bits (Int64.of_int (i*3)));
    test (Int64.float_of_bits (Int64.of_int (i*7919)));
    test (Int64.float_of_bits (Int64.of_int (i*27644437)));
  done

That was only for to-nearest. As any method that starts from a float, it has no chance to find the remaining bug that I mentioned in my function. I am thinking of starting from a random rational, convert it to float, get the neighboring floats, convert all three to rationals, and check that the initial conversion was indeed the smallest above/largest below/nearest as appropriate.

For the remaining bug, I am thinking of returning let small = 1.0e-200 in (Z.to_float negat) *. small *. small when the exponent is small enough, because that will produce the correct result in all rounding modes, without requiring the inclusion in Zarith of functions to access the rounding mode.

pascal-cuoq commented 6 years ago

Here is a version in which I don't know to be any bug:

let to_float x =
  match classify x with
  | ZERO -> 0.0
  | INF  -> infinity
  | MINF -> neg_infinity
  | UNDEF -> nan
  | NZERO ->
    let p = x.num and q = x.den in
    let np = Z.numbits p and nq = Z.numbits q in
    if np <= 53 && nq <= 53 then
      (* p and q convert to floats exactly; use FP division to get the
         correctly-rounded result. *)
      Int64.to_float (Z.to_int64 p) /. Int64.to_float (Z.to_int64 q)
    else begin
      let negat =
        if Z.lt p Z.zero then -1 else 1
      in
      (* p is in [2^(np-1), 2^np)
         q is in [2^(nq-1), 2^nq)
         We define n,p',q' such that p'/q'*2^n=p/q and |p'/q'| is in [1, 2). *)
      let n = np - nq in
      (* Scaling p/q by 2^n *)
      let (p', q') =
        if n >= 0
        then p, Z.shift_left q n
        else Z.shift_left p (-n), q
      in
      let (p', n) =
        if Z.geq (Z.abs p') q'
        then (p', n)
        else (Z.shift_left p' 1, pred n)
      in

      let p' = Z.shift_left p' 54 in
      (* Euclidean division of p' by q' *)
      let (quo, rem) = Z.ediv_rem p' q' in
      if n <= -1080
      then ldexp (float_of_int negat) (-1080)
      else
        let offset =
          if n <= -1023
          then
            (* quo has the form:       1xxxx...
               we add:               1000000...
               so as to end up with: 101xxxx... *)
            Z.shift_left (Z.of_int negat) (55 + (-1023 - n))
          else
            Z.zero
        in
        let quo = (Z.lor) offset quo in
        let quo =
          if Z.sign rem = 0
          then quo
          else Z.(lor) Z.one quo (* round to odd *)
        in
        let f = Z.to_float quo in
        let f = f -. (Z.to_float offset)
        in
        ldexp f (n - 54)
    end

I simplified the logic for the computation of offset. I was worried about a quotient such as 0x1ffffffffffff being rounded up to the next binade—and thus the wrong precision—after being offset and passed to ldexp, but since it only happens for rationals that end up converted to a power of two, the off-by-one on the targeted binade is irrelevant.

I tested the function for round-to-nearest with the following code:

exception Large
exception Bad

let test q =
  let f0 = Q.to_float q in
  let repr = Int64.bits_of_float f0 in
  try
    if f0 <> f0 then raise Bad;
    let f1, f2 =
      if f0 = 0.
      then
        let f1 = Int64.float_of_bits 1L in
        let f2 = -. f1 in
        f1, f2
      else if abs_float f0 >= max_float
        then raise Large
      else begin
            Int64.float_of_bits (Int64.succ repr),
            Int64.float_of_bits (Int64.pred repr)
        end
    in
    let q0, q1, q2 =
      Q.of_float f0, Q.of_float f1, Q.of_float f2
    in
    let d0 = Q.abs (Q.sub q q0) in
    let d1 = Q.abs (Q.sub q q1) in
    let d2 = Q.abs (Q.sub q q2) in
    if Q.lt d1 d0 || Q.lt d2 d0
    then raise Bad;
    let even = Int64.equal 0L (Int64.logand 1L repr) in
    if (Q.equal d0 d1 || Q.equal d0 d2) && not even
    then raise Bad;
  with
  | Large -> ()
  | Bad ->
    Printf.printf "%s %h\n\n" (Q.to_string q) f0

let () = test (Q.of_string "23/7")

let () =
  for i = 1 to 10000 do
    let x = Z.of_int64 (Random.int64 0x7fffffffffffffffL) in
    let y = Z.of_int64 (Random.int64 0x7fffffffffffffffL) in
    for j = 950 to 1150 do
      test (Q.make x (Z.shift_left y j));
      test (Q.make y (Z.shift_left x j))
    done
  done
;;

mbarbin commented 6 years ago

The quickcheck yields:

    ("unexpectedly raised" (
      "random input"
      (value -1.43279037293961E-322)
      (error (
        (mismatch
          (x  -1.43279037293961E-322)
          (x' 2.2250738585071871E-308))
        "")))) |}]

I haven't had time to review whether this counter example makes sense.

I am sorry it is not easier to compile the quickcheck's approach because of the core dependency. Please feel free to PM me any error message you get, maybe I would be able to help. On the bright side people are working on jbuilder/dune so hopefully that should improve in the near future. Maybe you know another open source quickcheck library that does not depend on core that you would be willing to use instead? I do believe of the added value of quickcheck iterations, in addition to the more systematic testing range that you have.

Gbury commented 6 years ago

Well, there is https://github.com/c-cube/qcheck which does not depend on Core afaik.

pascal-cuoq commented 6 years ago

I guess I should have included negative numbers in my test generation, or not changed at the last minute an add by a (lor). I am now testing with:

exception Large
exception Bad

let test q =
  let f0 = Q.to_float q in
  let repr = Int64.bits_of_float f0 in
  try
    if f0 <> f0 then raise Bad;
    let f1, f2 =
      if f0 = 0.
      then
        let f1 = Int64.float_of_bits 1L in
        let f2 = -. f1 in
        f1, f2
      else if abs_float f0 >= max_float
        then raise Large
      else begin
            Int64.float_of_bits (Int64.succ repr),
            Int64.float_of_bits (Int64.pred repr)
        end
    in
    let q0, q1, q2 =
      Q.of_float f0, Q.of_float f1, Q.of_float f2
    in
    let d0 = Q.abs (Q.sub q q0) in
    let d1 = Q.abs (Q.sub q q1) in
    let d2 = Q.abs (Q.sub q q2) in
    if Q.lt d1 d0 || Q.lt d2 d0
    then raise Bad;
    let even = Int64.equal 0L (Int64.logand 1L repr) in
    if (Q.equal d0 d1 || Q.equal d0 d2) && not even
    then raise Bad;
  with
  | Large -> ()
  | Bad ->
    Printf.printf "%s %h\n\n" (Q.to_string q) f0

let () = test (Q.of_string "23/7")

let () =
  for i = 1 to 100000 do
    let x = Z.of_int64 (Random.int64 0x7fffffffffffffffL) in
    let y = Z.of_int64 (Random.int64 0x7fffffffffffffffL) in
    for j = 950 to 1150 do
      test (Q.make x (Z.shift_left y j));
      test (Q.make (Z.neg x) (Z.shift_left y j));
      test (Q.make y (Z.shift_left x j))
    done
  done
;;

This last bug should probably also be taken as evidence that I should not submit a pull request until I have written tests for the other rounding modes.

The current version of to_float is as follows:

let to_float x =
  match classify x with
  | ZERO -> 0.0
  | INF  -> infinity
  | MINF -> neg_infinity
  | UNDEF -> nan
  | NZERO ->
    let p = x.num and q = x.den in
    let np = Z.numbits p and nq = Z.numbits q in
    if np <= 53 && nq <= 53 then
      (* p and q convert to floats exactly; use FP division to get the
         correctly-rounded result. *)
      Int64.to_float (Z.to_int64 p) /. Int64.to_float (Z.to_int64 q)
    else begin
      let negat =
        if Z.lt p Z.zero then -1 else 1
      in
      (* p is in [2^(np-1), 2^np)
         q is in [2^(nq-1), 2^nq)
         We define n,p',q' such that p'/q'*2^n=p/q and |p'/q'| is in [1, 2). *)
      let n = np - nq in
      (* Scaling p/q by 2^n *)
      let (p', q') =
        if n >= 0
        then p, Z.shift_left q n
        else Z.shift_left p (-n), q
      in
      let (p', n) =
        if Z.geq (Z.abs p') q'
        then (p', n)
        else (Z.shift_left p' 1, pred n)
      in

      let p' = Z.shift_left p' 54 in
      (* Euclidean division of p' by q' *)
      let (quo, rem) = Z.ediv_rem p' q' in
      if n <= -1080
      then ldexp (float_of_int negat) (-1080)
      else
        let offset =
          if n <= -1023
          then
            (* quo has the form:       1xxxx...
               we add:               1000000...
               so as to end up with: 101xxxx... *)
            Z.shift_left (Z.of_int negat) (55 + (-1023 - n))
          else
            Z.zero
        in
        let quo = Z.add offset quo in
        let quo =
          if Z.sign rem = 0
          then quo
          else Z.(lor) Z.one quo (* round to odd *)
        in
        let f = Z.to_float quo in
        let f = f -. (Z.to_float offset)
        in
        ldexp f (n - 54)
    end

mbarbin commented 6 years ago

This version satisfies the quickcheck. I ran a bench compare on the values published there https://github.com/janestreet/bignum/blob/master/bench/bignum_bench.ml using core bench and this is basically fine:

┌──────────────────────────────────────────────────────────┬──────────┬───────────┐
│ Name                                                     │ Time/Run │   mWd/Run │
├──────────────────────────────────────────────────────────┼──────────┼───────────┤
│ [bignum_bench.ml:float] to_float (decimal)               │   1.15us │   336.00w │
│ [bignum_bench.ml:float] to_float (scientific)            │   7.73us │ 1_278.00w │
│ [bignum_bench.ml:float] to_float (fraction)              │   3.96us │   888.00w │
│ [bignum_bench.ml:float] to_float__with_fix (decimal)     │   1.18us │   336.00w │
│ [bignum_bench.ml:float] to_float__with_fix (scientific)  │  10.33us │ 1_794.00w │
│ [bignum_bench.ml:float] to_float__with_fix (fraction)    │   5.84us │ 1_012.00w │
└──────────────────────────────────────────────────────────┴──────────┴───────────┘

Understood about other rounding modes, I have only tested the default nearest half to even. Sorry I was not more helpful on the code itself, I just glanced at it and determined it would take me way too long to get up to speed on the topic. Thank you very much.

hhugo commented 4 years ago

@pascal-cuoq, should we push the fix forward and close the issue?

xavierleroy commented 4 years ago

I cannot say I have fully reviewed the new implementation, but based on my understanding of the code and the extensive testing, I'm in favor of merging.

However, the test in tests/tofloat.ml must be updated to include @pascal-cuoq's new test and to stop testing Q.to_float with rounding modes other than to nearest.

pascal-cuoq commented 4 years ago

@xavierleroy I have placed a version of my implementation with extra comments in a pull request.

Note that the original specification of Q.to_float is preserved:

val to_float: t -> float
(** Converts to a floating-point number, using the current
    floating-point rounding mode…

For the fast path, both calls to Int64.to_float are exact and /. produces the end-result according to the current rounding mode. For the slow path, only the call to ldexp in the “the result is a zero or a neighbor of zero” branch, and the call to Z.to_float in the general branch actually depends on the rounding mode. The other operations that would ordinarily depend on the rounding mode are exact in the context of this function.

So the new Q.to_float is supposed to work in all rounding modes and could be tested in all rounding modes, just with a slightly different method than the one I sketched out for nearest-even. For round-downwards, it would look like:

pick a random rational q
apply Q.to_float, yielding f
convert f to a rational and check it is <= q
convert nextafter f infinity to a rational and check it is > q

xavierleroy commented 4 years ago

Thanks, Pascal, for the PR. tests/tofloat.ml already tries to test Q.to_float with several rounding modes, but I remember getting inconsistent results (e.g. "it passes on x86 and fails on ARM"), so something may be wrong with the test. Plus, the test you gave in a comment above looks more comprehensive.