l-ramirez-lopez / resemble

resemble is an R package which implements functions dedicated to non-linear modelling of complex spectroscopy data
Other
20 stars 14 forks source link

correlation dissimilarity in `search_neighbors()` throws an error #38

Closed l-ramirez-lopez closed 1 year ago

l-ramirez-lopez commented 1 year ago
library(prospectr)
library(resemble)

data(NIRsoil)

search_neighbors(
  NIRsoil$spc, 
  t(colMeans(NIRsoil$spc)), 
  "cor", k = 10
)

Error:

Error in neighbors_diss[1:k, , drop = FALSE] : 
  incorrect number of dimensions
philipp-baumann commented 1 year ago
Browse[1]> str(dsm)
List of 2
 $ dissimilarity: num [1:825, 1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:825] "Xr_1" "Xr_2" "Xr_3" "Xr_4" ...
  .. ..$ : chr "Xu_1"
 $ documentation: chr(0) 

Browse[1]> ls()
 [1] "center"               "diss_method"          "documentation"       
 [4] "dsm"                  "input_dots"           "k"                   
 [7] "k_diss"               "k_range"              "kk"                  
[10] "pc_selection"         "return_dissimilarity" "return_projection"   
[13] "scale"                "spike"                "ws"                  
[16] "Xr"                   "Xu"                   "Yr"
philipp-baumann commented 1 year ago
Browse[1]> str(colMeans(X))
 Named num [1:700] 0.357 0.356 0.356 0.356 0.355 ...
 - attr(*, "names")= chr [1:700] "1100" "1102" "1104" "1106" ...

Browse[1]> str(colMeans(Xu))
 Named num [1:700] 0.357 0.356 0.356 0.356 0.355 ...
 - attr(*, "names")= chr [1:700] "1100" "1102" "1104" "1106" ...

@l-ramirez-lopez by your code definition above, the unknown sample is identical to the matrix column mean of Xr. The resulting distance matrix is then not a number.

philipp-baumann commented 1 year ago

ok see this is something in fast_diss armadillo impl.

philipp-baumann commented 1 year ago
Browse[1]> Xu <- matrix(c(rep(0, 699), 1), nrow = 1)

Browse[1]> str(fast_diss(Xu, Xr, "cor"))
 num [1:825, 1] 0.483 0.546 0.525 0.551 0.444 ...

Browse[1]> Xu <- matrix(rep(0, 700), nrow = 1)

Browse[1]> str(fast_diss(Xu, Xr, "cor"))
 num [1:825, 1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
philipp-baumann commented 1 year ago

question is, how do you wanna treat NaN output? Error in the first place?

l-ramirez-lopez commented 1 year ago

@philipp-baumann you are right, the error is actually in the fast_diss() function used by cor_diss()

l-ramirez-lopez commented 1 year ago

A sanity check was added. Correlation coefficients for observations with an standard deviation of 0 cannot be computed.