Open waldemarhorwat opened 12 months ago
If the partial values «p0, p1, …, pk-1» can differ in sign, then it's possible to get obscure cases where the mathematical value about to be stored into p0 is on a knife edge that rounds to ±∞ while p1 would have the opposite sign that would cause the rounding to prefer the largest-magnitude finite value. This would cause a problem for the pick-next-addend-with-opposite-sign approach, but would be handled by the single-pass scaling-down-p0 approach.
Here is an implementation using the single-pass scaling approach. This was indeed a bit tedious, but I think it's correct - I also wrote code to convert doubles to/from BigInts (scaled up by 2**1023), in which domain we can do addition/subtraction precisely, and used this to fuzz my implementation. This caught a few nasty edge cases. It's possible I might still have missed a case somewhere.
Note that, following Python, the partials are in ascending order of magnitude, so the "overflow" partial is conceptually last in this implementation, not first.
In the original issue for adding fsum to python, there is attached a version based on a different approach:
In brief, the accumulated sum-so-far is represented in the form
huge_integer * 2**(smallest_possible_exponent)
and the huge_integer is stored in base 2**30, with a signed-digit representation (digits in the range [-2**29, 2**29).
Because this approach doesn't use 2Sum, it doesn't have issues with overflow.
I've extracted it from the diff in the thread. I'm probably not going to bother implementing this in JS as well, but it does prove the possibility.
I also wrote code to convert doubles to/from BigInts (scaled up by 2**1023)
This actually scales up the double values by 21075, which is one factor of two too many — every resulting BigInt is even. The smallest representable nonzero double is 2-1074 so it would be less confusing to scale up the values by 21074 unless for some reason you need every BigInt to be even.
I also don't understand this snippet:
let binary = magnitude.toString(2);
if (binary === 0) {…}
The if condition compares a string to a number using ===, which can never be true.
This actually scales up the double values by 21075, which is one factor of two too many — every resulting BigInt is even. The smallest representable nonzero double is 2-1074 so it would be less confusing to scale up the values by 21074 unless for some reason you need every BigInt to be even.
Sorry, yes, 21075. I'd previously left the last bit as 0 because it let me write 2n ** (BigInt(exponent)) * significand
, which was easier for me to think about than having to correct the off-by-one, but I've tweaked it now in https://github.com/bakkot/proposal-math-sum/commit/5b69773911f94aba4321fd8a3f111ef76b44e023.
I also don't understand this snippet:
Whoops, that's a leftover from an earlier version. You're right that it doesn't do anything now (and would be redundant with the binary.length <= 53
case immediately below it in any case). Removed.
I've simplified the main implementation following this code from the original Python issue.
The implementation I had earlier was conceptually quite close to Shewchuk's algorithm: it added x
into each partial in order, culminating in the "overflow" partial, handling overflows by allowing x
to be biased and doing arithmetic on potentially-biased values. This is very easy to reason about, but kind of annoying to implement.
But it doesn't have to work like that. Instead, when overflow occurs during one of the "x
+ a partial" steps, you can reduce that sum by 2 1024 and add 2 1024 into the "overflow" partial immediately. That preserves all of the relevant properties (partials are non-overlapping and ascending in magnitude, etc), and means you don't need to worry about arithmetic on biased values.
I'm trying to put together disparate sources to convince myself of the correctness of the theorems, specifically to make sure that the invariants assumed by a subsequent step actually match what the previous step claims to produce. Can you document the precise invariant that the partials are required to satisfy in the implementation? The details matter here.
The Python issue assumes that the invariant is that they're nonoverlapping and in increasing magnitude.
In addition to edge cases around overflows, some of what I'm looking at: "Nonoverlapping" does not mean that they're necessarily 53 bits apart; the binary values 11010000, 1000, 100, and 1.101 are nonoverlapping doubles, but some of the theorems in the papers require stronger input criteria which a sequence like that would violate. It's probably ok in this usage, but I need to check.
Quick links:
I've been relying on Theorem 10 of the Robust Arithmetic paper, which is fairly straightforward and which only needs only the assumption that the partials are finite, nonoverlapping, and increasing in magnitude (and implicitly that two-sum
doesn't overflow). Including a single final "overflow" partial whose low bit is 21024 obviously preserves the finite, nonoverlapping, increasing properties, and we can handle overflow explicitly. The algorithm fails once the "overflow" partial loses precision, at which point we throw.
The later theorems are much more complicated and require other properties, but aren't actually necessary for this implementation.
From what I can tell the paper doesn't cover producing a single sum from the expansion once you've added all the values in, but I convinced myself of the correctness of the procedure used in msum
(namely, add the partials in descending order of magnitude, stopping once that loses precision) as follows: Because the partials are descending in order of magnitude, once you've started losing precision you know that the lowest bit of the value you've just added is at least a factor of 253 smaller in magnitude than the high bit of the current value of the sum, and by the nonoverlapping property the sum of all the remaining partials is strictly smaller in magnitude than the lowest bit of the value you've just added, so adding in all the remaining partials can't affect the sum except in the case that it would affect rounding, which is handled explicitly.
Just to double-check: the invariant is that the partials are nonzero, finite, nonoverlapping, and increasing in magnitude? (Theorem 10 allows zero-valued partials in input and output, while crsum assumes that the partials are nonzero and can produce incorrect results if zero-valued partials exist).
Yes, though strictly speaking that's covered by "increasing in magnitude" (except for possibly the first partial, for which it doesn't matter if it's zero). Theorem 10 does not guarantee that you end up with a list of partials with the "nonzero" property, but that's easily achieved by discarding partials which are 0.
To be precise: I've been relying on a slightly modified version of Theorem 10, which looks like:
If you have an expansion e where all the partials are finite, nonoverlapping and increasing in magnitude, and the Two-Sum algorithm doesn't overflow, then applying the Grow-Expansion algorithm given e and b followed by a step which discards any partials which are 0 will produce a new expansion e' where all the partials are finite, nonoverlapping, and increasing in magnitude, and such that the mathematic sum of e' is equal to the mathematical sum of e and b.
Incidentally there's been some research into faster approaches since Shewchuk '96; the fastest I've found is described in this post and given in code here.
Per its claims there, on medium-size and larger lists of doubles (~1k+ elements), you can get exact summation with a 4x slowdown over naive summation by using 67 64-bit values (536 bytes). If you have a larger list (10k+ elements) and are willing to spend more memory, you can get exact summation with less than 2x slowdown over naive summation at the cost of using 33 Kb of memory.
Hi. I notice that my xsum algorithm and software is mentioned here. You may be interested that it's used by the xsum package for Julia (https://juliapackages.com/p/xsum).
You may also be interested in the following recent paper (which I haven't read in full detail, but which looks interesting):
Lange, M., Towards accurate and fast summation, https://web.archive.org/web/20220616031105id_/https://dl.acm.org/doi/pdf/10.1145/3544488
One advantage of superaccumulators, as used in my method, is that one can easily give the correct result when it's a finite double even if naive summation would produce overflow for an intermediate result. One just needs enough bits in the superaccumulator to handle summation of the largest possible double the maximum concievable number of times. In my code, I think I assume a maximum of 2^32 operands, but one can allow for more by just adding a few more bits to the superaccumulator.
Hm, seems like the xsum library has a bug where it incorrectly handles rounding of sums whose value is below -DBL_MAX
. It rounds such values to -Infinity
even if they should round to -DBL_MAX
. It doesn't have the same error for the positive case. You can exercise this by building the library then echo '-1.0 -1.7976931348623157e+308' | ./xsum-test debug
; the result is Final rounded result: -inf (overflowed)
, which is wrong.
I suspect it's some trivial error but I'm not likely going to be able to track it down myself in the near future. It passes my other tests, though I haven't yet set up a fuzzer for it. But engines which want to use it will need to either fix or work around this bug first.
I'll be looking into the issue with xsum, and hope to have a fix within a week or so.
I propose making the sum be the deterministic mathematical sum of the inputs, rounded to the nearest double, breaking ties to even in the typical IEEE double fashion. The special cases are as for regular addition:
This can be implemented as follows, using the exact intermediate sum technique which takes an arbitrary number of finite IEEE double addends a0, a1, …, an-1 and, barring intermediate or final overflows, represents their exact mathematical sum a0 + a1 + … + an-1 as s = p0 + p1 + …+ pk-1, where the p's are IEEE doubles, all have the same sign, are in descending magnitude order, are non-overlapping, meaning that each successive p's exponent is at least 53 less than the previous one, and are nonzero (except possibly p0).
This is the simplest approach. The algorithm is asymptotically linear in the number of addends n and efficient, typically having k no more than 2.
The steps can be folded into a mostly or fully single-pass algorithm. For example:
If one wishes to do a fully single-pass algorithm, one can avoid the scan for addends of the opposite sign from the running total by scaling down the most significant partial p0 and values added into it by 52 powers of 2 if it would overflow, while leaving the other partials p1, …, pk-1 unscaled. This is a bit tedious requiring scaling when working across the first and second partial but can be done efficiently.