Open duboism opened 10 years ago
Yes, the package currently requires all matrices to be 2D 64-bits arrays. We should think about how to seamlessly allow other types of input data, and especially sparse matrices!
SVD for sparse matrices is included in newer versions of scipy, as you say, but we do not have access to that..;-/
What you could do, is for instance to implement a new loss function, something like SquaredSumError, or something like that, that would represent the function f(x, y) = ||x - y||²_2.
@tomlof we created the SquaredSumError
loss function so right now it's not longer an issue for us.
However we should probably think about using sparse matrix where possible. As we mentionned in other issues we could simply aim for newer scipy version (with sparse SVD).
Ok, great! Feel free to add that to the library. You can put it in functions.losses or functions.penalties, whatever makes the most sense.
Actually, some penalties could be used as loss functions, and vice versa, so the distinction between losses and penalties is not clear-cut. Perhaps we should think about a better way to split them up in different modules?
Fouad and I recently wanted to use the
LinearRegression
loss function with a sparseX
matrix (the identity). This is clearly a corner case but it really helps in our case.Unfortunately, the method
L
usesnp.linalg.svd
which doesn't work with sparse matrix.Searching a bit, it appears that scipy 0.9.0 (the version under ubuntu 12.04) doesn't deal well in this case. Apparently this is better supported with newer versions. Édouard has a small code snippet to perform SVD with old numpy versions.
Maybe we could introduce a function in
utils.math
that would intelligently wrap the right call to numpy/scipy (depending on the version) or to Édouard's trick or maybe to yourFastSVD
function.