~~optimize SDC~~ - Githubissues

sarastro-nl commented 1 year ago

So you once wrote about SDC:

It's not the fastest or cleverest algorithm, but a rarely-invoked fallback doesn't have to be, and there is value in simplicity.

I couldn't agree more, so I propose a very small change which makes the code even simpler but at the same moment saves a lot of calls to the 'expensive' lshift and rshift calls making the algorithm also faster as a bonus.

I do realize that 'simpler' is a bit of a personal judgement, but I think I may have a case because of these arguments:

focus is now on the range [1 .. 2] which is after all the end goal and not [½ .. 1] (which imho is making things needlessly complicated; in the article you mention that this range is 'slightly easier, computationally, than getting to the [1 .. 2] range' but I don't see why, maybe I'm missing something)
lshift and rshift while loops are now identical which makes things simple (well, almost identical except for a small enhancement mentioned below)
no need for a double nested if in the lshift while loop
comparing the first 3 digits (which is also done in other places in the code) is straightforward and the reader can directly see the relationship between the mantissa and the target exp2 and the final lshift
no need to do an exp2-- which to a reader might be somewhat unexpected and feels like an exception ('// We're in the range [½ .. 1] but f64 uses [1 .. 2].')
no need to pass the 'magic' number '53' to the final lshift, which is needed because of the 'wrong' range [½ .. 1] in the first place. If the mantissa is already in [1 .. 2] it's now easy to see from the code that the lshift is just the default 52.

Test results:

I ran the std/json tests with my change and made sure SDC is always used, so I temporarily disabled Eisel-Lemire and the code when the number is exactly representable by a double. The result was that it saved 73 calls to lshift and 32 calls to rshift.

About the change itself:

The change makes sure that once the lshift and rshift loops are done the decimal_point is either 0 or 1 (so the value is the range [0.1 .. 10]). It then looks at the digits to calculate the final lshift. If the number happens to be already in [1 .. 2] then the final lshift is just 52. This calculated shift will be between 56 (for values in the range [0.1 .. 0.125] and 49 (for values in the range [8 .. 10]).

Another very small change is this: the same entry from the powers table is used for the lshift and rshift. This entry is chosen in such a way that if the value is divided by (2 ** entry) you'll end up in [0.1 .. 2]. And since this is within [0.1 .. 10] it's the ideal candidate for the rshift. But the lshift does the opposite so with that same entry you will always end up in the range [0.05 .. 1]. Not good. So if you just add 1 to the entry the range will be [0.1 .. 2] again which is exactly within the target range. This simple addition alone will save needless calls to lshift.

google-cla[bot] commented 1 year ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

nigeltao commented 1 year ago

Thanks for the PR.

For testing, please also run Wuffs' script/manual-test-parse-number-f64.cc program on the more-than-5-million test cases in https://github.com/nigeltao/parse-number-fxx-test-data

I'm slightly curious how many lshift and rshift calls this commit saves on that larger test set, with and without Eisel-Lemire disabled.

Having said all of that, though, as you said, "'simpler' is a bit of a personal judgement". Conversely, the algorithm as currently written exactly matches what's already written up as https://nigeltao.github.io/blog/2020/parse-number-f64-simple.html#simple-decimal-conversion (and that's linked to from https://github.com/google/wuffs/blob/dee65c0e61778bce6b0b0c4d26d68f1d579869dc/internal/cgen/base/floatconv-submodule-code.c#L1308) and as separately implemented by the Go standard library. Unless it's clearly simpler, I'm hesitant to change the code so that it's no longer an exact match.

sarastro-nl commented 1 year ago

I ran the tests that you proposed and the test results were successful.

I counted the number of lshift's and rshift's in both while loops (the others don't make a difference in counting).

with Eisel-Lemire and 'exactly representable by a double':

without this change

lshift: 4,515,800 rshift: 2,797,045

with this change

lshift: 3,681,413 (18.5% less) rshift: 2,710,039 (3.1% less)

without Eisel-Lemire and 'exactly representable by a double':

without this change

lshift: 25,929,870 rshift: 21,591,185

with this change

lshift: 20,517,544 (20.9% less) rshift: 20,981,083 (2.8% less)

nigeltao commented 1 year ago

OK, it's enough of a performance improvement (in terms of fewer lshift/rshift calls) that it's worth merging. Two things remain:

You'll still need to sign the CLA per https://github.com/google/wuffs/pull/87/checks?check_run_id=9461919896 above.
Wuffs is in the "release candidate" stage of the version 0.3 release cycle, so I'll hold off on merging any performance changes (as opposed to bug-fixing changes). This PR will have to wait until version 0.4 opens. I'll ping this discussion after that happens.

sarastro-nl commented 1 year ago

Apparently the email that I used to open this PR was non-existent (...@local) and I need a valid one to be able to sign the CLA. I tried to amend that original commit to an existing one but I was not able to fix that very first commit. So I'm afraid I'll have to open a new PR for this and close this one. If you want we could paste our comments in that new PR to keep track of the history?

google / wuffs

optimize SDC #87

Test results:

About the change itself:

with Eisel-Lemire and 'exactly representable by a double':

without this change

with this change

without Eisel-Lemire and 'exactly representable by a double':

without this change

with this change

google / wuffs

~~optimize SDC~~ #87

Test results:

About the change itself:

with Eisel-Lemire and 'exactly representable by a double':

without this change

with this change

without Eisel-Lemire and 'exactly representable by a double':

without this change

with this change

optimize SDC #87