Closed rcannood closed 3 years ago
Thank for this. I intended to return zero if a vector has no variance as in mat1
and mat3
but mat1
and mat2
is an unexpected behavior that I need to fix.
> mat1 <- Matrix::Matrix(1:4, nrow = 1, sparse = TRUE)
> mat2 <- Matrix::Matrix(rep(0.1111, 4), nrow = 1, sparse = TRUE)
> mat3 <- Matrix::Matrix(rep(1, 4), nrow = 1, sparse = TRUE)
> proxyC::simil(mat1, mat2, method = "correlation")
1 x 1 sparse Matrix of class "dgTMatrix"
[1,] Inf
> proxyC::simil(mat1, mat3, method = "correlation")
1 x 1 sparse Matrix of class "dgTMatrix"
[1,] .
Returning NA
is more difficult because it makes the matrix dense. I wanted users to deal with vectors without variance using rowSds()
or colSds()
manually, but happy to change if there is a better way.
> proxyC::rowSds(mat1)
[1] 1.290994
> proxyC::rowSds(mat2)
[1] 0
Returning zeros instead of NAs makes sense in this context.
Would it make sense to throw a one-time warning if one of the standard deviations is zero?
I committed a proposed solution for this issue. Feel free to reject it and provide a different solution altogether.
Sorry, I don't have a solution for this issue yet. When I compute the correlation on vectors with sd == 0, the output correlation is 0 or +inf instead of NA or NaN.
Example:
Needless to say, this can lead to some very strange behaviour in downstream results ;)
I created a branch, issue_20, which breaks a test to address this issue.