Closed Nyholm closed 3 years ago
We’re happy for any improvements! Make sure you fully test the functions against the results from an independent source. For example, if a textbook has examples with solutions, or you can use R, octave, or numpy with the same data.
Using the process from this document
https://personal.utdallas.edu/~herve/Abdi-PLS-pretty.pdf
I was able to use Excel to manually get the same results as the following R code:
library(mixOmics)
sample_data = as.data.frame(matrix(c(7, 4, 10, 16, 13, 7, 3, 5, 7, 3, 13, 14, 12, 11, 10, 7, 7, 5, 3, 3), nrow = 5, ncol = 4))
colnames(sample_data) <- c("input1","input2","input3","input4")
pred = as.data.frame(matrix(c(14, 10, 8, 2, 6, 7, 7, 5, 4, 2, 8, 6, 5, 7, 4), nrow = 5, ncol = 3))
sample_data
colnames(pred) <- c("output1", "output2", "output3")
pred
pls.model <- pls(sample_data, pred, ncomp=3)
pls.model$loadings
So that seems like a workable algorithm.
pls.model$loadings$X=W
pls.model$loadings$Y = C
pls.model$variates$X = each column proportional to T
pls.model$variates$Y = U
Thank you. This is much helpful. Do you mind share the excel doc too?
I deleted the excel spreadsheet. PHP-ish psudocode follows. I have no idea how one actually uses these matrices in practice, so i would not be comfortable coding this class up myself with meaningful tests without a lot more research on my part. And I have not tested any of this in PHP.
// Prepare Data, can use similar process as in PCA class
$E = standardize columns of X
$F = Standardized columns of Y
// Ensure same number of rows
if ($E->getM() !== $F->getM()) {
throw exception;
}
while (!$E->isEqual(MatrixFactory::zero($E->getM(), $E->getN())) {
//Random Vector
$u_new = MatrixFactory::random($F->getM(), 1)->scalarDivide(21);
// Iterate this process until vectors converge
do {
// Sum of squares of each matrix to keep track of residuals.
$SSX[] = ($E->frobeniusNorm()) ** 2;
$SSY[] = ($Y->frobeniusNorm()) ** 2;
$u = $u_new;
// $w = Eᵀ * u / ‖Eᵀ * u‖; (unit vector in direction of Eᵀ * u);
$temp = $E->transpose()->multiply($u);
$w = $temp->scalarDivide($temp->frobeniusNorm());
// $t = E * w / ‖E * w‖;
$temp = $E->multiply($w);
$t = $temp->scalarDivide($temp->frobeniusNorm());
// $c = Fᵀ * t / ‖Fᵀ * t‖;
$temp = $F->transpose()->multiply($t);
$c = $temp->scalarDivide($temp->frobeniusNorm());
$u_new = $F->multiply($c);
} while (!$u_new->isEqual($u));
$p = $E->transpose()->multiply($t);
$b = $t->transpose()->multiply($u)->get(0,0);
$B[] = $b; // $B is a vector of each of the $b values. Eventually we should(?) convert $B to a diagonal matrix
$T[] = $t; // move $t to a new column of $T
// and do the same with w, u, p, and c
$E = $E->subtract($t->multiply($p->transpose()));
$F = $F->subtract($t->multiply($c->transpose())->scalarMultiply($b));
}
From the R documentation
“Many ... formulations can be found in literature. It has been shown, for instance, that only one of X and Y needs to be deflated; alternatively, one can directly deflate the crossproduct matrix S (as is done in SIMPLS, for example). Moreover, there are many equivalent ways of scaling. In the example above, the scores t have been normalised, but one can also choose to introduce normalisation at another point in the algorithm. Unfortu- nately, this can make it difficult to directly compare the scores and loadings of different PLSR implementations.“
Here’s another R PLS snippet that gives different scores but I might be doing something wrong with the setup.
library(pls)
sample_data = as.data.frame(scale(matrix(c(7, 4, 10, 16, 13, 7, 3, 5, 7, 3, 13, 14, 12, 11, 10, 7, 7, 5, 3, 3, 14, 10, 8, 2, 6, 7, 7, 5, 4, 2, 8, 6, 5, 7, 4), nrow = 5, ncol = 7), center = TRUE, scale = TRUE))
colnames(sample_data) <- c("x1","x2","x3","x4", "y1", "y2", "y3")
pls.model <- plsr(cbind(y1, y2, y3) ~ x1 + x2 + x3 + x4, data=sample_data, ncomp=3, method="oscorespls")
pls.model$scores
Some code sources below: PLS2 looks like it is the most straightforward, and is what is described in the wikipedia article, however, most examples online appear to use the mvr function. It looks like several PLSR algorithms use SVD, which has not been integrated into mathPHP yet. PLS1/2 do not rely on SVD,
https://rdrr.io/cran/chemometrics/src/R/pls2_nipals.R https://rdrr.io/cran/pls/src/R/mvr.R
As an update #385 is a pull request for PLS in MathPHP. I would appreciate any feedback if you are able to review it to see if it gives you your expected results, and that it has the features you would expect.
This can be closed
PLS is available in the latest release v2.2.0
I've looked at this library and I think Im correct when saying that Partial Least Squares (PLS) regression is missing. It is similar to the principal component regression that is implemented.
I need this feature and I was hoping that this project would accept a PR. Im not too confident on the math, so if someone know they can do this quickly, then I would be super happy. If not I'll make sure to struggle my way forward =)
See https://en.wikipedia.org/wiki/Partial_least_squares_regression
Question to the maintainer: Will you accept a PR with this?