scalanlp / breeze

Breeze is/was a numerical processing library for Scala.
https://www.scalanlp.org
Apache License 2.0
3.45k stars 693 forks source link

Filtering matrix with a bit vector doesn't work #506

Open danielkorzekwa opened 8 years ago

danielkorzekwa commented 8 years ago
val v = DenseVector(1d, 2d, 3d)
val filteredV = v(v :> 2d) //works fine

val d = DenseMatrix(1d,2d,3d)
val filteredD = d(d(::,0) :>2d,::) //compilation error

The use case here is to filter matrix by some column. Currently, first I use findAll on a particular column and then I filter matrix like this(see below), but the above would be more compact.

val idx = d.filterAll(x => x>2)
d(idx,::)

Ideally I would like to filter matrix by row like this (or something like that):

m.filterByRow(row => ....)
alexland commented 8 years ago

this function implements the functionality described above, but is it consistent with the breeze idioms/syntax patterns?

i think "filter matrix by row" refers to filtering rows from a matrix based on value(s) in a given column; so the function below takes three arguments: the DenseMatrix, a column ID, and a filter function and returns a DenseMatrix whose rows are some subset of the one passed in. Therefore, i named this function filterRows rather than filterByRow

what's more while Daniel describes using a bit vector to extract rows from a DenseMatrix, this vector (the vector returned by calling findAll) is not a bit vector but an integer vector of row numbers. (In other array libraries i have used, eg, NumPy, Julia, R, the vector returned would indeed be a bit vector). I assume this doesn't matter because it seems the primary interest is getting a subset of rows returned by specifying a column in the original matrix and a filter function to process it.

lastly, how should it be made generic? Clearly breeze uses spire, but not extensively (usually just cfor); other DenseMatrix and DenseVector methods are @specialized, but what is the current breeze standard for parameterized types?'

    // spire to make the function generic:
    import spire.implicits._
    import spire.math._ 
    import spire.algebra._

    def filterRows[A:Numeric](M:breeze.linalg.DenseMatrix[A], colId:Int, f:A => Boolean) = {
        val col = M(::,colId)
        val idx = col.findAll(f)
        M(idx,::).toDenseMatrix
    }

call it like so:

    val m = DenseMatrix((5, 7, 8), (7, 1, 3), (6, 7, 4))

    def f(x:A):Boolean = (x < 6)

    filterRows(m, 1, f)

happy to help any way i can.

mpetnuch commented 8 years ago

Hi! I actually just issued a pull request which addresses this functionality (among other things). It allows for consistent slicing operations across Vector, DenseMatrix, and SliceMatrix. If accepted you will be able to use a BitVector in either the rowSlice, colSlice or both.