khliland / pls

The pls R package
36 stars 3 forks source link

Projecting observations on correlation loading plot #6

Closed Drosof closed 5 years ago

Drosof commented 5 years ago

Dear package author, I mainly use the pls package to explore how sensory attributes relates to chemical compounds (X and Y data matrices). I am also interested to look at how individual observations (rows) relate to these variables using correlation loading plots. I compute the correlation loadings to be plotted further, using the following code:

library(pls)
data("oliveoil")
oil <- plsr(sensory ~ chemical, scale = TRUE, data = oliveoil)
scores <- oil$scores
sc1 <- scores[,1]
sc2 <- scores[,2]
scores <- as.data.frame(cbind(sc1, sc2))
cl_c <- as.data.frame(cor(oliveoil$chemical, scores))
cl_s <- as.data.frame(cor(oliveoil$sensory, scores))
plot_cl  <-  rbind(cl_c, cl_s)
plot_cl <- setNames(plot_cl, c("comp1", "comp2"))

However I don't manage to project the observations on the correlation plot, as posted in this thread: https://stackoverflow.com/questions/52906389/correlation-loading-plot-from-plsr-with-observations-using-ggplot2. The plot example from the thread is from a commercial software I don't have access to. I tried to reproduce a similar plot using pls/plsdepot packages and ggplot. I guess that would require a transformation step of both the X and Y scores so that individual observations can be projected on a correlation loading scale (-1: 1). I haven't yet found a solution to that. Any suggestions will be much appreciated. Regards Pierrick

bhmevik commented 5 years ago

You can always scale the scores to a suitable size and add them to the plot. For instance, working with a matrix version of your scores data frame, you could scale the scores to have maximum distance 1 from the origin, or to have maximum absolute value 1, like this:

S <- as.matrix(scores)
maxdist <- max(apply(S, 1, function(x) sqrt(sum(x^2))))
maxabs <- max(abs(S))
Sdist <- S/maxdist
Sabs <- S/maxabs

Now the points in Sdist will lie within a maximum radius of 1 from the origin, while the points in Sabs will lie within ±1 along both components. Other scalings are possible, of course

As for which scaling one should choose, and the interpretation of plotting the scores in a correlation loading plot, I don't know. I've searched a bit around the net, but haven't found anything definite. The closest I've found is a part of the SAS manual. There, they have added scores in the plot, and describes what one should look for in them, but they don't say anything about it any scaling of the scores.

Drosof commented 5 years ago

Sorry for the late reply. I am still investigating the issue but haven't found a good solution yet. I tried your scaling methods but plotting the results on the the correlation loading plot does not really make sense with regard to the raw data from oliveoil. I keep searching.