Closed sarastro-nl closed 1 year ago
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
Thanks for the PR.
For testing, please also run Wuffs' script/manual-test-parse-number-f64.cc
program on the more-than-5-million test cases in https://github.com/nigeltao/parse-number-fxx-test-data
I'm slightly curious how many lshift
and rshift
calls this commit saves on that larger test set, with and without Eisel-Lemire disabled.
Having said all of that, though, as you said, "'simpler' is a bit of a personal judgement". Conversely, the algorithm as currently written exactly matches what's already written up as https://nigeltao.github.io/blog/2020/parse-number-f64-simple.html#simple-decimal-conversion (and that's linked to from https://github.com/google/wuffs/blob/dee65c0e61778bce6b0b0c4d26d68f1d579869dc/internal/cgen/base/floatconv-submodule-code.c#L1308) and as separately implemented by the Go standard library. Unless it's clearly simpler, I'm hesitant to change the code so that it's no longer an exact match.
I ran the tests that you proposed and the test results were successful.
I counted the number of lshift
's and rshift
's in both while loops (the others don't make a difference in counting).
lshift: 4,515,800 rshift: 2,797,045
lshift: 3,681,413 (18.5% less) rshift: 2,710,039 (3.1% less)
lshift: 25,929,870 rshift: 21,591,185
lshift: 20,517,544 (20.9% less) rshift: 20,981,083 (2.8% less)
OK, it's enough of a performance improvement (in terms of fewer lshift/rshift calls) that it's worth merging. Two things remain:
Apparently the email that I used to open this PR was non-existent (...@local
) and I need a valid one to be able to sign the CLA. I tried to amend that original commit to an existing one but I was not able to fix that very first commit. So I'm afraid I'll have to open a new PR for this and close this one. If you want we could paste our comments in that new PR to keep track of the history?
So you once wrote about SDC:
It's not the fastest or cleverest algorithm, but a rarely-invoked fallback doesn't have to be, and there is value in simplicity.
I couldn't agree more, so I propose a very small change which makes the code even simpler but at the same moment saves a lot of calls to the 'expensive'
lshift
andrshift
calls making the algorithm also faster as a bonus.I do realize that 'simpler' is a bit of a personal judgement, but I think I may have a case because of these arguments:
lshift
andrshift
while loops are now identical which makes things simple (well, almost identical except for a small enhancement mentioned below)if
in thelshift
while loopexp2
and the finallshift
exp2--
which to a reader might be somewhat unexpected and feels like an exception ('// We're in the range [½ .. 1] but f64 uses [1 .. 2].')lshift
, which is needed because of the 'wrong' range [½ .. 1] in the first place. If the mantissa is already in [1 .. 2] it's now easy to see from the code that thelshift
is just the default52
.Test results:
I ran the std/json tests with my change and made sure SDC is always used, so I temporarily disabled Eisel-Lemire and the code when the number is exactly representable by a double. The result was that it saved 73 calls to
lshift
and 32 calls torshift
.About the change itself:
The change makes sure that once the
lshift
andrshift
loops are done thedecimal_point
is either 0 or 1 (so the value is the range [0.1 .. 10]). It then looks at the digits to calculate the finallshift
. If the number happens to be already in [1 .. 2] then the finallshift
is just 52. This calculated shift will be between 56 (for values in the range [0.1 .. 0.125] and 49 (for values in the range [8 .. 10]).Another very small change is this: the same entry from the powers table is used for the
lshift
andrshift
. This entry is chosen in such a way that if the value is divided by (2 ** entry) you'll end up in [0.1 .. 2]. And since this is within [0.1 .. 10] it's the ideal candidate for thershift
. But thelshift
does the opposite so with that same entry you will always end up in the range [0.05 .. 1]. Not good. So if you just add 1 to the entry the range will be [0.1 .. 2] again which is exactly within the target range. This simple addition alone will save needless calls tolshift
.