Closed eed3si9n closed 6 years ago
Did you consider Kahan’s algorithm? It might be accurate enough and also more efficient.
@pnf Thanks for the pointer. It didn't occur to me, but it makes sense.
I implemented Kahan summation in https://github.com/eed3si9n/collection-strawman/tree/wip/double-range2, but it wasn't accurate enough out of the box.
For test I am using val r = NumericRange.inclusive(0.0, 1000.0, 0.1)(DoubleAsIfIntegral)
.
collection-strawman> junit/testOnly strawman.collection.NumericRangeTest
[info] Compiling 1 Scala source to /Users/eed3si9n/work/collection-strawman/collections/.jvm/target/scala-2.13.0-M2/classes...
[info] Test run started
[info] Test strawman.collection.NumericRangeTest.emptyIterator started
[info] Test strawman.collection.NumericRangeTest.nonEmptyIterator started
[info] Test strawman.collection.NumericRangeTest.doubleRangeToList started
[info] Test strawman.collection.NumericRangeTest.doubleRangeUneven started
[info] Test strawman.collection.NumericRangeTest.doubleRangeContains started
List(0.0, 0.1, 0.2, 0.30000000000000004, ..snip..
37.800000000000004, 37.9, 38.0, 38.1, 38.2, 38.300000000000004, 38.400000000000006,
38.5, 38.6, 38.7, 38.800000000000004, 38.900000000000006, 39.0, 39.1, 39.2,
39.300000000000004, 39.400000000000006, 39.5, 39.6, 39.7, 39.800000000000004,
39.900000000000006, 40.0, 40.1, 40.2, 40.300000000000004, 40.400000000000006, 40.5,
40.6, 40.7, 40.800000000000004, 40.900000000000006, 41.0, 41.1, 41.2, 41.300000000000004,
41.400000000000006, 41.5, 41.6, 41.7, 41.800000000000004, 41.900000000000006, 42.0,
..snip.., 1000.0)
[error] Test strawman.collection.NumericRangeTest.doubleRangeContains failed: java.lang.AssertionError: expected:<-1> but was:<3>, took 0.05 sec
As you can see there the lower bits are pretty noisy, and it's noisy enough for contains(r(i))
to not pass. This required me to run an aggressive round off. I chose 12 decimal, but it probably needed to be calibrated based on the range and step.
But probably because of the rounding, it actually became slower than doing things in BigDecimal land. Here's looping on 10000 points and calling contains(r(i))
:
Kahan with rounding 11156 ms 11189 ms 11464 ms
BigDecial without rounding 3416 ms 3011 ms 3263 ms
Reading from my phone right now, so I'm not completely sure, but I think this PR will have severe consequences on the code size in Scala.js, because it will pull in the big number implementation every time one uses NumericRange
, whether with floating point numbers or other Numeric
s.
@sjrd Perhaps I should refactor and provide Double specific Range class?
@eed3si9n It looks like you are still using BigDecimal
s in the base NumericRange
implementation, right?
@julienrf It's used in NumericRange companion, but I tried to not use them in sealed class NumericRange[T]
, assuming that's where it hurts Scala.JS if it needs to code gen for a bunch of types.
Seth suggests we discuss this here instead of over at https://github.com/scala/bug/issues/10759, so (reposting):
Is it possible to drop numeric ranges that pretend to be for Float
and Double
? These are just bad news due to numeric imprecision. If you say something like 0.1 to 0.7 by (0.1 + 0.2)
you would think you would have three steps, but you either won't, or you will but other code will be randomly inaccurate (because you rounded things enough to make the above work). For what it's worth, Octave's equivalent, [0.1:x:0.7] produces [0.1, 0.4, 0.7] if x is up to 3e-16 above 0.3, but [0.1, 0.4] if it is 4e-16 above 0.3. (And you also get weird cases where for some values of x
you end on the expected endpoint when the endpoint is small but miss it if it's bigger.) This just emphasizes that nobody can really "do it right".
Normally I don't like breaking changes, but in the case where there is no way to do the right thing except by not actually using the type that is stated, I think the better thing to do is remove the feature.
Note that sanity is possible if you have typed in numeric literals but not if you just have x to y by z
because you can't know whether the imprecision in x
and y
and z
is intended or not. If you can't even tell if your inputs honestly represent the intended values, using BigDecimal
is not enough to recover non-surprising behavior. And I'm not sure being "better but still flawed" is justification enough to basically use a different data type than the one you're saying you're using.
I lean towards keeping it, and perhaps adding some cautionary language to the Scaladoc.
If we get rid of DoubleRange
and the like, it won't stop people from inlining their own versions into their code and making all the same mistakes. I think we should provide the best version we can and document any limitations or sharp edges. Some people will still get hurt, but they'd have gotten hurt anyway, maybe worse.
Let's give people clean needles, rather than hope that if they only have dirty needles, they'll stop taking drugs. Because they won't stop.
I agree with @SethTisue. Path through pitfall is unavoidable and better to have well-known pitfall rather than private handmade . Every ambitious dreamer can dig own anyway.
As it stands, Double
range is so broken that I can't imagine anyone using it in production.
So, there is a danger in giving users a real-enough looking vehicle that works for day-to-day usage, but the wheels come off when going over 70mph on the highway (112km/h).
Although I enjoyed addressing a handful of issues in this PR, I also don't want to be responsible for accident with Real consequences (pun). I am ok with strawman withdrawing the Float
and Double
range. It would be cool if the compilation failed with suggestion to migrate to BigDecimal via a (restligeist) macro.
As it stands, Double range is so broken that I can't imagine anyone using it in production
even after your rewrite?
@SethTisue - The problem is that it is a minefield, and the best way to avoid the mines is to use a BigDecimal range instead. Except then we are saying "this is a Double", but everything acts as if it is a BigDecimal (e.g. performance); we just don't say so.
So we should just come clean and say, "Look, Float and Double just aren't suitable for ranges. Here's a BigDecimal range that you can use almost as easily."
even after your rewrite?
After my change, some might be tempted to use it. But as @Ichoran's examples show, I am only handling sample-code usages of with literals, but not really solving the fundamental issue that adding two Doubles, even unassuming values like 0.1, is not accurate.
Related: it might be cool to reconsider BigDecimal literal https://github.com/scala/bug/issues/5988
I am closing this for now.
Is there a plan/issue/PR to get rid of Double
and Float
ranges?
I was gonna send a PR for this at some point, but I can open an issue first in case if anyone else wants to pick it up.
uncle! I'm okay with the removal.
Ref https://github.com/scala/bug/issues/8518 Ref https://github.com/scala/bug/issues/9875 Ref https://github.com/scala/bug/issues/9874 Ref https://github.com/scala/bug/issues/8620 Ref scala/bug#8670
NumericRange[Double] has been broken because various parts of the code assumes sane plus operation, which is not possible with Double or Float. Within a few iterations of adding 0.1 or 0.2 it can quickly accumulate floating errors that are surprising.
When encountering Double or Float, this internally uses BigDecimal, which can add numbers without losing precision. This will produce range iterator that looks like
List(1.0, 1.2, 1.4, 1.6, 1.8, 2.0)
when converted into List, as opposed toList(1.0, 1.2, 1.4, 1.5999999999999999, 1.7999999999999998, 1.9999999999999998)
.