scala / collection-strawman

Implementation of the new Scala 2.13 Collections
Apache License 2.0
200 stars 72 forks source link

Fix NumericRange[Double] #489

Closed eed3si9n closed 6 years ago

eed3si9n commented 6 years ago

Ref https://github.com/scala/bug/issues/8518 Ref https://github.com/scala/bug/issues/9875 Ref https://github.com/scala/bug/issues/9874 Ref https://github.com/scala/bug/issues/8620 Ref scala/bug#8670

NumericRange[Double] has been broken because various parts of the code assumes sane plus operation, which is not possible with Double or Float. Within a few iterations of adding 0.1 or 0.2 it can quickly accumulate floating errors that are surprising.

scala> import collection.immutable.NumericRange
import collection.immutable.NumericRange

scala> val r = NumericRange.inclusive(1.0, 2.0, 0.2)(scala.math.Numeric.DoubleAsIfIntegral)
r: scala.collection.immutable.NumericRange.Inclusive[Double] = NumericRange 1.0 to 2.0 by 0.2

scala> assert(r(3) == r.toList(3), s"${r(3)} != ${r.toList(3)}")
java.lang.AssertionError: assertion failed: 1.6 != 1.5999999999999999

scala> val x = NumericRange.inclusive(0.0, 1.0, 0.1)(scala.math.Numeric.DoubleAsIfIntegral)
x: scala.collection.immutable.NumericRange.Inclusive[Double] = NumericRange 0.0 to 1.0 by 0.1

scala> x.drop(3).take(3).toList
res3: List[Double] = List(0.30000000000000004, 0.4) // take(3) mean taking three thing usually

When encountering Double or Float, this internally uses BigDecimal, which can add numbers without losing precision. This will produce range iterator that looks like List(1.0, 1.2, 1.4, 1.6, 1.8, 2.0) when converted into List, as opposed to List(1.0, 1.2, 1.4, 1.5999999999999999, 1.7999999999999998, 1.9999999999999998).

pnf commented 6 years ago

Did you consider Kahan’s algorithm? It might be accurate enough and also more efficient.

pnf commented 6 years ago

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

eed3si9n commented 6 years ago

@pnf Thanks for the pointer. It didn't occur to me, but it makes sense.

eed3si9n commented 6 years ago

I implemented Kahan summation in https://github.com/eed3si9n/collection-strawman/tree/wip/double-range2, but it wasn't accurate enough out of the box.

For test I am using val r = NumericRange.inclusive(0.0, 1000.0, 0.1)(DoubleAsIfIntegral).

collection-strawman> junit/testOnly strawman.collection.NumericRangeTest
[info] Compiling 1 Scala source to /Users/eed3si9n/work/collection-strawman/collections/.jvm/target/scala-2.13.0-M2/classes...
[info] Test run started
[info] Test strawman.collection.NumericRangeTest.emptyIterator started
[info] Test strawman.collection.NumericRangeTest.nonEmptyIterator started
[info] Test strawman.collection.NumericRangeTest.doubleRangeToList started
[info] Test strawman.collection.NumericRangeTest.doubleRangeUneven started
[info] Test strawman.collection.NumericRangeTest.doubleRangeContains started
List(0.0, 0.1, 0.2, 0.30000000000000004, ..snip.. 
37.800000000000004, 37.9, 38.0, 38.1, 38.2, 38.300000000000004, 38.400000000000006, 
38.5, 38.6, 38.7, 38.800000000000004, 38.900000000000006, 39.0, 39.1, 39.2, 
39.300000000000004, 39.400000000000006, 39.5, 39.6, 39.7, 39.800000000000004, 
39.900000000000006, 40.0, 40.1, 40.2, 40.300000000000004, 40.400000000000006, 40.5, 
40.6, 40.7, 40.800000000000004, 40.900000000000006, 41.0, 41.1, 41.2, 41.300000000000004, 
41.400000000000006, 41.5, 41.6, 41.7, 41.800000000000004, 41.900000000000006, 42.0, 
..snip.., 1000.0)
[error] Test strawman.collection.NumericRangeTest.doubleRangeContains failed: java.lang.AssertionError: expected:<-1> but was:<3>, took 0.05 sec

As you can see there the lower bits are pretty noisy, and it's noisy enough for contains(r(i)) to not pass. This required me to run an aggressive round off. I chose 12 decimal, but it probably needed to be calibrated based on the range and step.

But probably because of the rounding, it actually became slower than doing things in BigDecimal land. Here's looping on 10000 points and calling contains(r(i)):

Kahan with rounding 11156 ms 11189 ms 11464 ms

BigDecial without rounding 3416 ms 3011 ms 3263 ms

sjrd commented 6 years ago

Reading from my phone right now, so I'm not completely sure, but I think this PR will have severe consequences on the code size in Scala.js, because it will pull in the big number implementation every time one uses NumericRange, whether with floating point numbers or other Numerics.

eed3si9n commented 6 years ago

@sjrd Perhaps I should refactor and provide Double specific Range class?

julienrf commented 6 years ago

@eed3si9n It looks like you are still using BigDecimals in the base NumericRange implementation, right?

eed3si9n commented 6 years ago

@julienrf It's used in NumericRange companion, but I tried to not use them in sealed class NumericRange[T], assuming that's where it hurts Scala.JS if it needs to code gen for a bunch of types.

Ichoran commented 6 years ago

Seth suggests we discuss this here instead of over at https://github.com/scala/bug/issues/10759, so (reposting):

Is it possible to drop numeric ranges that pretend to be for Float and Double? These are just bad news due to numeric imprecision. If you say something like 0.1 to 0.7 by (0.1 + 0.2) you would think you would have three steps, but you either won't, or you will but other code will be randomly inaccurate (because you rounded things enough to make the above work). For what it's worth, Octave's equivalent, [0.1:x:0.7] produces [0.1, 0.4, 0.7] if x is up to 3e-16 above 0.3, but [0.1, 0.4] if it is 4e-16 above 0.3. (And you also get weird cases where for some values of x you end on the expected endpoint when the endpoint is small but miss it if it's bigger.) This just emphasizes that nobody can really "do it right".

Normally I don't like breaking changes, but in the case where there is no way to do the right thing except by not actually using the type that is stated, I think the better thing to do is remove the feature.

Note that sanity is possible if you have typed in numeric literals but not if you just have x to y by z because you can't know whether the imprecision in x and y and z is intended or not. If you can't even tell if your inputs honestly represent the intended values, using BigDecimal is not enough to recover non-surprising behavior. And I'm not sure being "better but still flawed" is justification enough to basically use a different data type than the one you're saying you're using.

SethTisue commented 6 years ago

I lean towards keeping it, and perhaps adding some cautionary language to the Scaladoc.

If we get rid of DoubleRange and the like, it won't stop people from inlining their own versions into their code and making all the same mistakes. I think we should provide the best version we can and document any limitations or sharp edges. Some people will still get hurt, but they'd have gotten hurt anyway, maybe worse.

Let's give people clean needles, rather than hope that if they only have dirty needles, they'll stop taking drugs. Because they won't stop.

optician commented 6 years ago

I agree with @SethTisue. Path through pitfall is unavoidable and better to have well-known pitfall rather than private handmade . Every ambitious dreamer can dig own anyway.

eed3si9n commented 6 years ago

As it stands, Double range is so broken that I can't imagine anyone using it in production. So, there is a danger in giving users a real-enough looking vehicle that works for day-to-day usage, but the wheels come off when going over 70mph on the highway (112km/h).

Although I enjoyed addressing a handful of issues in this PR, I also don't want to be responsible for accident with Real consequences (pun). I am ok with strawman withdrawing the Float and Double range. It would be cool if the compilation failed with suggestion to migrate to BigDecimal via a (restligeist) macro.

SethTisue commented 6 years ago

As it stands, Double range is so broken that I can't imagine anyone using it in production

even after your rewrite?

Ichoran commented 6 years ago

@SethTisue - The problem is that it is a minefield, and the best way to avoid the mines is to use a BigDecimal range instead. Except then we are saying "this is a Double", but everything acts as if it is a BigDecimal (e.g. performance); we just don't say so.

So we should just come clean and say, "Look, Float and Double just aren't suitable for ranges. Here's a BigDecimal range that you can use almost as easily."

eed3si9n commented 6 years ago

even after your rewrite?

After my change, some might be tempted to use it. But as @Ichoran's examples show, I am only handling sample-code usages of with literals, but not really solving the fundamental issue that adding two Doubles, even unassuming values like 0.1, is not accurate.

Related: it might be cool to reconsider BigDecimal literal https://github.com/scala/bug/issues/5988

eed3si9n commented 6 years ago

I am closing this for now.

NthPortal commented 6 years ago

Is there a plan/issue/PR to get rid of Double and Float ranges?

eed3si9n commented 6 years ago

I was gonna send a PR for this at some point, but I can open an issue first in case if anyone else wants to pick it up.

eed3si9n commented 6 years ago

https://github.com/scala/bug/issues/10781

SethTisue commented 6 years ago

uncle! I'm okay with the removal.