khliland / pls

The pls R package
36 stars 3 forks source link

summary() % variance explained fails for PLS with multiple Y columns and 1 component #38

Closed monicathieu closed 3 months ago

monicathieu commented 4 months ago

Hi! Thank you for managing this PLS package. (My research supervisor likes to run his PLS regressions in Matlab, so, you guys are helping me stay in R where I feel more comfortable!)

I've encountered an error with calling summary() on PLS objects with a multi-column response matrix and 1 component. The error I found can be reproduced with:

library(pls)
oliveoil1 <- plsr(sensory ~ chemical, ncomp=1, data = oliveoil)
summary(oliveoil1)

The output I get is:

Data:   X dimension: 16 5 
    Y dimension: 16 6
Fit method: kernelpls
Number of components considered: 1
TRAINING: % variance explained
Error in dimnames(tbl) <- list(c("X", yvarnames), paste(1:object$ncomp,  : 
  length of 'dimnames' [1] not equal to array extent

I chased down the error in the summary method a bit, and I think I've pinpointed it:

https://github.com/khliland/pls/blob/d548b826a22ac28709afc26ad1c3c69b38e98d0a/R/summaries.R#L101-L102

In the specific case that the response term has multiple columns, and ncomp = 1, this appears to fail because yve gets coerced to a numeric vector, not a matrix, when drop() is called on the R2 result. Then,

https://github.com/khliland/pls/blob/d548b826a22ac28709afc26ad1c3c69b38e98d0a/R/summaries.R#L103

When rbind binds two vectors together, the vectors automatically get put along the row dimensions, so tbl is putting yve in the second row and repeating cumsum(xve) along the first row. Which is very much unexpected! So then the dimnames set in the next command don't fit.

(This error case actually doesn't trigger with a single response term and ncomp = 1 because rbind() behaves as expected when binding 2 length-1 vectors together. So the error only triggers when the response term has multiple columns!)

I believe this can be fixed by ensuring that yve is a matrix with responses in the rows and comps in the columns before rbind-ing. This particular syntax is super hacky, but I believe this works:

tbl <- rbind(cumsum(xve), t(t(yve)))

When yve is already a matrix, transposing it twice returns it to its original state. When yve is a length-1 vector, it becomes a 1x1 matrix which also rbinds as expected. But finally (weirdly?), when yve is a length-n vector, t(yve) turns it into a 1xn matrix, and then t(t(yve)) turns it into the expected nx1 matrix.

That fix looks so hacky, though! So I would totally understand if you wanted to implement the fix a different way.

Thank you for your attention!

khliland commented 4 months ago

Dear Monica,

Thank you for your interest in this package and your thorough analysis of the error. Regarding your solution, I think you would get the same behavior from as.matrix(yve). However, the odd thing is that I am not able to reproduce the error message on my computer. May I ask which versions of R and the pls package you are using. With R 4.4.1 and pls 2.8.3 I do not get an error with your code in a freshly started R.

Regards, Kristian

monicathieu commented 4 months ago

Thanks for your quick reply! That is so strange, though, I think I have the same versions...

pls 2.8.3 R 4.4.1

I just restarted my R session and I can still reproduce the error on my machine. I am running R on Linux (on our compute cluster)... it would be INSANE if that were creating a difference?

khliland commented 4 months ago

An update: When I reinstalled the package, I also received the error. I will try to figure it out early next week and submit an update. Thanks again @monicathieu for spotting it!

khliland commented 3 months ago

The bug fix is now included in version 2.8-4 on CRAN and here on GitHub.