khliland / pls

The pls R package
36 stars 3 forks source link

Why is PRESS0 > TSS? #21

Closed FPI-MT closed 4 years ago

FPI-MT commented 4 years ago

I am learning to use pls and analyzing the object returned by mvrCv / crossval. While learning, I am also cross checking the results via calculations in excel.

The PRESS0 value appears to be: PRESS0 = sample variance n [n / (n-1)]

However, I would expect PRESS0 to be: PRESS0 = Total Sum of Squares = sample variance * (n-1)

Why is PRESS0 being increased to be larger than the total sum of squares?

bhmevik commented 4 years ago

The leave-one-out crossvalidated PRESS0 is defined as \Sigma_{i=1}^n (yi - 1/(n-1) * \Sigma{j≠i} yj)^2, and the sample variance, Var(y), as 1/(n-1) * \Sigma{i=1}^n (yi - 1/n * \Sigma{j=1}^n yj)^2. If you do the math, you will find that LOO-CV PRESS0 = n^2/(n-1) * Var(y). (The trick is to rewrite \Sigma{j≠i} yj as \Sigma{j=1}^n y_j - y_i in the PRESS0 formula.)