haskell-numerics / hmatrix

Linear algebra and numerical computation
381 stars 104 forks source link

Don't drop empty rows in mkCSR and mkSparse #321

Closed HuwCampbell closed 4 years ago

HuwCampbell commented 4 years ago

The previous implementation would forget the actual row numbers and assume that they were sequential.

We now keep the actual row numbers, and space the column positions with the distance between the rows.

This was a pretty quick hack together. I think this whole function could still actually be improved, as it's pretty unreadable and can still give incorrect answers if the same coordinates are declared twice.

Fixes #220 Fixes #320

HuwCampbell commented 4 years ago

Here's the examples as they evaluate now from those 2 issues. Adding something to the test suite would be best, but it needs a fair amount of love first.

*Numeric.LinearAlgebra Numeric.LinearAlgebra.Devel> mkSparse [((0,999),1.0),((1,1999),2.0)]
SparseR {gmCSR = CSR {csrVals = [1.0,2.0], csrCols = [1000,2000], csrRows = [1,2,3], csrNRows = 2, csrNCols = 2000}, nRows = 2, nCols = 2000}
*Numeric.LinearAlgebra Numeric.LinearAlgebra.Devel> mkCSR  [((0,999),1.0),((1,998),4.0) ]
CSR {csrVals = [1.0,4.0], csrCols = [1000,999], csrRows = [1,2,3], csrNRows = 2, csrNCols = 1000}
*Numeric.LinearAlgebra Numeric.LinearAlgebra.Devel> mkCSR  [((1,999),1.0),((2,998),4.0) ]
CSR {csrVals = [1.0,4.0], csrCols = [1000,999], csrRows = [1,1,2,3], csrNRows = 3, csrNCols = 1000}
*Numeric.LinearAlgebra Numeric.LinearAlgebra.Devel> mkCSR  [((0,0), 1.0), ((1,1), 1.0), ((3,0), 1.0), ((3,1), 1.0), ((6,0), 1.0), ((8,1), 1.0)]
CSR {csrVals = [1.0,1.0,1.0,1.0,1.0,1.0], csrCols = [1,2,1,2,1,2], csrRows = [1,2,3,3,5,5,5,6,6,7], csrNRows = 9, csrNCols = 2}
HuwCampbell commented 4 years ago

So a few years ago I needed to do something like this; but reading from a CSV in order to grab a lot of data and use svdlib to... well, do an SVD.

I don't have the code anymore, but I think it was something like this https://github.com/haskell-numerics/hmatrix/compare/master...HuwCampbell:topic/mutable-mkCSR?expand=1

The idea here is that we should be able to build up a CSR with data coming from pipes, conduit, or streaming, over an IO Monad stack using mutable vectors. As a bonus, I believe that it's slightly faster and uses less memory overall.

HuwCampbell commented 4 years ago

Closing in favour of #322