Closed vdukhovni closed 3 years ago
The lowest GHC we support is 7.10. Thanks also for adding all those test cases!
Please don't merge this, it does not take effects into account correctly. I'll rework it to address the issue.
[ Actually, perhaps not a problem, I was thrown off by thinking about how overflow handling might work, but since we're not doing overflow detection, I think this is OK, but just in case, I'll review again... ]
I pushed a new commit that handles cases where the leading +
or -
is at a chunk boundary, and what follows is not an unsigned integer. This needs to return Nothing
not 0
, but also return a leftover string with the +
or -
sign as a short chunk in front of the chunk with the non-integer payload. Since it is not too difficult (but quite wrong) to accidentally return just the original input stream (one of whose effects is already performed), I enhanced the tests to verify that we're not performing too few or too many effects by the time the entire stream has been processed.
Also added tests to check some more "+"/"-" at chunk boundary cases.
I am prototyping (no PR yet), a new implementation that restores (more correct) overflow protection:
minBound
to maxBound
Nothing
and the invalid input as the tail of the string, but internally, that input is magically reconstructed from the accumulated partial number (last value prior to would-be overflow) and the number of digits consumed, by generating the corresponding decimal string with enough leading zeros and the original explicit sign (if any). This frees us from having to copy the input.The performance is only slightly worse than this PR, for 100M space-separated Ints that are 1..100M in random order, the time to read and add them all up changes from ~4.84s to 4.88s on my (somewhat dated) Intel CPU.
By introducing the "tail trimming" variant of readInt
, I'm inclined to drop the generic "Chunker" interface in this PR as being too fancy. In practice users will likely want just the verbatim remainder, or the trimmed version. The only thing they won't get that way is the ability to control the safety mechanism, by e.g. choosing a different limit on the number of spaces consumed, or setting no limit at all. [ Edit: I'll look into whether good performance is preserved if the whitespace skip is instead performed before reading each integer, rather than after, this is more friendly for users who intermix reading integers with other consumers of stream data, but if this is done, the whitespace will be dropped unconditionally, even when ultimately returning Nothing
. ]
So before I open a "competing" PR, I'd like some feedback on what folks think about this version versus the description above?
I should also mention that while the code could In principle handle other bounded integral types, the approach can only work for decimal (or octal) inputs, with hex the input can use mixed case, and I can no longer reconstruct the input from the partly accumulated digits, which may have originated from separate chunks, making the code more complex and likely much slower.
Internally the accumulator is a Word
value, and all the digits are combined unsigned into the word, what's variable is the bounds at which I detect overflow. So handling Int32
or Word32
, ... would just be a matter of tweaking the bounds, but I am not inclined to address that at this time. Support for the other types was not previously available. We can look into that later.
One of the complications of handling the various (Bounded) integral types is that, unlike C
, Haskell has no uintmax_t
equivalent. The closest we've got is the Word64
and Int64
types, and there's some complex logic around how these have been bolted on in 32-bit vs. 64-bit systems. It seems that's starting to change, IIRC a recent PRs was introducing Word64#
primitives unconditionally.
Anyway, more could be done, but this is what I have now... Time for feedback...
CC: @chessai, @hsyl20, @bodigrim, @sjakobi, @cartazio
Internally the accumulator is a Word value, and all the digits are combined unsigned into the word, what's variable is the bounds at which I detect overflow.
BTW there is timesWord2
, which may be handy to detect overflows.
Internally the accumulator is a Word value, and all the digits are combined unsigned into the word, what's variable is the bounds at which I detect overflow.
BTW there is
timesWord2
, which may be handy to detect overflows.
Yes, I was aware of it, but since I have to a comparison either way (before multiplication or after), I went with the simpler "before" option. If the accumulator is below 1/10 of the maximum representable value, I keep going, if above I fail. And if equal, I check the input digit to see whether it is below or above the residue of the maximum value mod 10. This runs quite efficiently, and for most numbers only one test is needed, which I think is no more expensive than using and testing the result of timesWord2#
.
My experience with using numeric primops is that it is easy (and somewhat counter-intuitive) to actually get worse performance with them, because GHC seems to do fewer optimisations on code with primops, or because it is able to figure out a better set of primops to apply than I can, but many times when I thought I could write better code directly (avoiding allocations, ...) I found that GHC typically already used unlifted values wherever possible most of the time, and seems to have produced better code in most cases. :-)
There are a few cases in which I was able to get better code with priomops, but without knowing more about internals than I do, I struggle to intuit which are the cases where it is likely to yield good results.
But while, I have your attention, direct feedback on the substance of this PR, and proposed alternative would be very much appreciated. The tangential comments are of course also welcome! :-)
Removes arbitrary 18 byte limit on decimal input.
Significantly speeds up readInt by avoiding "toStrict", and handling chunk boundaries efficiently.
Adds a
readInt'
function that supports transformation of the stream tail to prepare for reading more input, by e.g. dropping leading whitespace while we are still holding the raw chunk, avoiding the cost of burying it deep in a closure, only to later find it again and trim.Reading ~100M space separated ints now takes ~5.5s vs. over 25s with the original implementation.
This does however mean that on an unbounded stream of digits the new
readInt
is not guaranteed to ever return. The documentation notes that users can elect to split the stream and only runreadInt
on some bounded initial segment.REVIEW topics:
If someone wants to bikeshed a better name for the
Chunker
type, I am not particularly fixed on the current name, though it seems OK to me. Or we could drop the type alias and use the full expansion in the signature ofreadInt
', but that could make the haddock difficult to fit horizontally on the screen.Note that
finiteBitSize
requires base 4.7, i.e. at least GHC 7.8. The CI does not test anything older than 7.10, so I assume that 7.6 is no longer supported. Otherwise a work-around is needed for older versions.The
LambdaCase
andMultiWayIf
pragmas are also used, these require at least GHC 7.6.Cc: @bodigrim, @chessai