ulfjack / ryu

Converts floating point numbers to decimal strings
Apache License 2.0
1.19k stars 99 forks source link

Clarification on lower bound of `compute_shortest`. #225

Closed PhucXDoan closed 7 months ago

PhucXDoan commented 7 months ago

The purpose of the modified compute_shortest in the paper, if I'm not mistaken, is to find an integer within the interval (a, c) with the fewest amount of significant figures and also closest to b, all properly rounded. Whether or not a and c are included in the interval is indicated by the accept_smaller and accept_larger boolean flags respectively.

So compute_shortest(a=1123, b=1234, c=1301, accept_smaller=false, accept_larger=false) would return (12, 2) indicating 12*10^2 = 1200, which is indeed within the interval (1123, 1301) and close to 1234.

What I'm confused about is something like compute_shortest(a=1000, b=1007, c=1234, accept_smaller=false, accept_larger=false). This would give (10, 2) which maps to 1000, but the lower bound a should be excluded, right?

It's this particular sentence in the proof of the modified compute_shortest that's flying over my head, where it is stated:

A return value of b{i} is therefore legal *if and only if a{i} != b_{i} or all_azero{i} is true*, ...

Why would b_{i} be legal just because all the digits truncated off of a were zeros?

Any clarifications here would be appreciated, thanks.

ulfjack commented 7 months ago

I think it should return (12,2). The value of b is never actually used. Am I missing something?

PhucXDoan commented 7 months ago

The unmodified compute_shortest would return (12, 2), but the modified compute_shortest uses b in the return statement to return (b_{i}, i) (b_{i}+1 if needed to round up).

If I'm not making any mistakes, I'd imagine the table below is what the history of the modified compute_shortest would be with a = 1000, b = 1007, c = 1234, accept_smaller = false, accept_larger = false. Since the upper bound is not included, c_{0} starts of with c - 1.

i a_{i} b_{i} c_{i} all_azero{i} all_bzero{i} digit_{i}
0 1000 1007 1233 true true 0
1 100 100 123 true true 7
2 10 10 12 true false 0

The iteration stops since cutting one more digit off of a_{i} and c_{i} will make them indistinguishable. The second loop described in the paper never gets executed since the lower bound is excluded anyways.

Before the modification, the algorithm will simply return (c_{i}, i) = (12, 2). In the modified compute_shortest, we'd determine if b_{i} needs to be rounded down or not:

is_tie          = digit_{i} == 5 && all_b_zero_{i}
want_round_down = digit_{i} <  5 || (is_tie && break_tie_down)
round_down      = (want_round_down && (a_{i} != b_{i} || all_a_zero_{i})) || b_{i} + 1 > c_{i}

In this case, round_down == false, so the modified compute_shortest returns (b_{i}, i} = (10, 2). This is just the lower bound, which can't be since accept_lower == false, so I must've made a mistake somewhere?

ulfjack commented 7 months ago

Right. Sorry.

Ok, it looks like a bug in the paper. Clearly, it shouldn't return (10,2) in this case but (11,2).

ulfjack commented 7 months ago

I think this is the line in the C implementation: https://github.com/ulfjack/ryu/blob/master/ryu/d2s.c#L260C35-L260C47

It rounds up if ((vr == vm && (!acceptBounds || !vmIsTrailingZeros)) || lastRemovedDigit >= 5) is true. This correctly captures the case that we need to round up if b_{i} = a_{i} and accept_lower is false.