l-ramirez-lopez / resemble

resemble is an R package which implements functions dedicated to non-linear modelling of complex spectroscopy data
Other
20 stars 14 forks source link

f_diss and fDiss #11

Closed AlexandreWadoux closed 4 years ago

AlexandreWadoux commented 4 years ago

With the new change in the f_diss function I obtain a different results by running my code.

# compute Mahalanobis distance between scores centre and the scores of the spectra
wmahald <- f_diss(Xr =  pcspectraA$scores, 
                 Xu =  pcspectraACentre, 
                 diss_method = 'mahalanobis', 
                 center = FALSE, scale = FALSE)

and the plot:

# plot the index of the spectra against the Mahalanobis distance
plot(wmahald,
     pch = 16,
     col = rgb(red = 0, green = 0.4, blue = 0.8, alpha = 0.5),
     ylab = 'Mahalanobis distance')

# add a horizontal line to better visualize the spectra with Mahalanobis dissimilarity scores larger than 1 (arbitrary threshold)
abline(h = 1, col = 'red')

unnamed-chunk-158-1

now it gives me a much larger distance:

image

l-ramirez-lopez commented 4 years ago

The scale of the results retrieved is now different than the one in the previous version. This come from a known bug in the scaling of the final results (as reported in the NEWS file).

The distance ratios (between samples) were correctly calculated, but the final scaling of the results was not properly done. The distance between Xi and Xj were scaled by taking the squared root of the mean of the squared differences and dividing it by the number of variables i.e. sqrt(mean((Xi-Xj)^2))/ncol(Xi), however the correct calculation is done by taking the mean of the squared differences, dividing it by the number of variables and then compute the squared root i.e. sqrt(mean((Xi-Xj)^2)/ncol(Xi)). This bug had no effect on the computations of the nearest neighbors.

The following code might help to understand how the scaling is now done:

library(prospectr)
data(NIRsoil)

Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]

# Mahalanobis distance computed on the first 20 spectral variables
n_variables <- 20

# resemble
md <- f_diss(
  Xr[, 1:n_variables], 
  Xr[1, 1:n_variables, drop = FALSE], 
  "mahalanobis", 
  center = FALSE
  )

# rstats
md_r <- mahalanobis(
  Xr[, 1:n_variables], 
  center = Xr[1, 1:n_variables, drop = FALSE], 
  cov = cov(Xr[, 1:n_variables])
  )

md_r <- sqrt((md_r)/n_variables) # scaling using the number of variables

plot(md, md_r)