scalanlp / breeze

Breeze is/was a numerical processing library for Scala.
https://www.scalanlp.org
Apache License 2.0
3.45k stars 693 forks source link

[Bug] stats.median dead loop when NaN exists #761

Closed zhangyi-hu closed 3 years ago

zhangyi-hu commented 4 years ago

Minimum reproducing example:

val a = DenseVector(1.0, 2.0, 3.0, Double.NaN) val b = stats.median(a)

bug introduced in commit: e234a81596f45d89f072f41f4ca96cbb0ace0787

relevant code:

    def partition3Ways(x: Array[T], left: Int, right: Int, pivot: Int): (Int, Int) = {
      val pivotVal = x(pivot)
      swap(pivot, left)
      var i = left
      var lt = left
      var gt = right

      while (i <= gt) { // dead loop if x(i) is NaN
        if (x(i) < pivotVal) { swap(lt, i); lt += 1; i += 1 }
        else if (x(i) > pivotVal) { swap(gt, i); gt -= 1 }
        else if (x(i) == pivotVal) { i += 1 }
      }

      (lt, gt)
    }

Please either throw (like Apache Common Maths), or ignore NaN and log a warning.

dlwh commented 4 years ago

(Sorry for the slow response. it appears I stopped receiving github notifications.)

I'll fix this soon.

bluesheeptoken commented 4 years ago

Any thoughts on that ? Throwing an exception ?

Maybe we could also implement a nanmedian ?

dlwh commented 3 years ago

yeah i'm just gonna throw. easiest thing to do.

sorry for being so slow.

dlwh commented 3 years ago

a nanmedian would be nice too, i guess