Closed ksw2000 closed 2 months ago
This is the first time I'm noticing that our ParseUfloat
actually returns the wrong values for very high numbers. Even though it's faster than strconv.ParseFloat
. I still think we should switch to strconv.ParseFloat
as being incorrect like this is very bad. What do you think?
1234e23
ParseUfloat: 123399999999999981930938368.000000
strconv.ParseFloat: 123399999999999999110807552.000000
1.234e+32
ParseUfloat: 123399999999999984608249181634560.000000
strconv.ParseFloat: 123400000000000002622647691116544.000000
123456789123456789.987654321
ParseUfloat: 17502027876.294655
strconv.ParseFloat: 123456789123456784.000000
Actually, I encountered this problem as well. I believe the reason is that we use a uint64
variable to represent a floating-point number. The largest value of uint64
is 1.8×10¹⁹. Therefore, the third example you provided, 123456789123456789.987654321, which has 27 digits, causes the uint64
value to overflow.
This is currently not documented at all. I'm thinking I would rather have the function return the correct values even for big numbers than have it be 20ns faster. Just using strconv.ParseFloat
would also result in less code to maintain. What do you think?
Parse the float string is a complex topic. In strconv.ParseFloat
, they use IEEE754 unbiased rounding.
In our implementation, we directly truncate the digits after the 19th position.
Although our implementation doesn't achieve the same high precision as strconv.ParseFloat
, the error is within an acceptable range for network applications.
I think we can add the comments that our ParseUfloat
is much faster than strconv.ParseFloat
, but the results are less accurate.
goos: windows
goarch: amd64
pkg: github.com/valyala/fasthttp
cpu: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
BenchmarkParseUfloat-8 13448752 78.83 ns/op
BenchmarkParseUfloatStrconv-8 4897358 248.1 ns/op
PASS
ok github.com/valyala/fasthttp 2.738s
1234e23
ParseUfloat: 1.2339999999999998e+26
strconv.ParseFloat: 1.234e+26
1.234e+32
ParseUfloat: 1.2339999999999998e+32
strconv.ParseFloat: 1.234e+32
123456789123456789.987654321
ParseUfloat: 1.2345678912345678e+17
strconv.ParseFloat: 1.2345678912345678e+17
There are other inconsistencies as well. For example this test fails because it returns 0
instead of 1
like strconv.ParseFloat
does.
testParseUfloatSuccess(t, "00000000000000000001", 1)
Or parsing 87000000000.2
results in 87000000000.200012
while strconv.ParseFloat
returns 87000000000.199997
which is a bigger difference for such a low number than I would like.
In the first case 00000000000000000001
, we can add some code to remove the leading zeros. As for the second case 87000000000.2
, the relative error between ParseUfloat
and strconv.ParseFloat
is only 1.75e-16.
The second case is very strange. Our algorithm calculates using float64(870000000002) * math.Pow10(-1)
, but if we change to using division, the result will be the same as strconv.ParseFloat
.
fmt.Printf("%f %f\n", float64(870000000002)*math.Pow10(-1), float64(870000000002)/math.Pow10(1))
87000000000.200012 87000000000.199997
71000000000000.1
results in a 1.562500e-02
relative difference. I'm trying to fuzz test the implementations against each other and I keep finding differences. There is also how they handle numbers with E
or X
. Or strconv.ParseFloat
sometimes returning NaN
instead of an error. There are so many differences that I really don't think it's worth the 20ns (nanoseconds) overhead. Unless someone is calling this function billions of times it's not going to make a difference. And if someone does that it's much better for them to use a more specific version in their own codebase.
The example you provided, using Relative Error = (expected value - actual value) / actual value
, the error is indeed very small, all below 10^-15. However, if our function cannot perfectly handle special cases like NaN, I agree with your suggestion. Perhaps we should consider directly calling strconv.ParseFloat
, or rewriting our function based on strconv.ParseFloat
. What do you think?
Lets just call strconv.ParseFloat
and include some error handling for negative numbers to keep that backwards compatible.
Thanks!
Replaced
offset
handling logic with more efficientmath.Pow10
based calculation.The
math.Pow10
function performs calculations using a lookup table, which is more efficient compared to the original method of continuous division. By switching to simple addition and subtraction operations, the execution time is reduced by approximately 30%.