plantphys / spectratrait

A tutorial R package for illustrating how to fit, evaluate, and report spectra-trait PLSR models. The package provides functions to enhance the base functionality of the R pls package, identify an optimal number of PLSR components, standardize model validation, and vignette examples that utilize datasets sourced from EcoSIS (ecosis.org)
GNU General Public License v3.0
12 stars 9 forks source link

R2 reporting bug #81

Closed serbinsh closed 3 years ago

serbinsh commented 3 years ago

In our example scripts, particularly when we create the side-by-side cal/val plots

Nmass_mg_g_Cal_Val_Scatterplots

I found a bug where the R2 doesn't match the R2 in the final validation plot

Nmass_mg_g_PLSR_Validation_Scatterplot

Thats because for the cal/val plotting we are putting in R2 using

val.R2 <- round(pls::R2(plsr.out,newdata=val.plsr.data)[[1]][nComps],2)

but that doesn't account for the fact that the intercept is included so instead of say giving me the results for nComp=5 its giving me the value at ncomp=4. We need to do

val.R2 <- round(pls::R2(plsr.out,newdata=val.plsr.data)[[1]][nComps+1],2)

which is what we do for the validation plot. So we need to make a change in our scripts to address this and re-run the examples :/

@julien is this also what you saw? Or was that a different issue?

serbinsh commented 3 years ago

Crap, we also need to do this for the calibration plot stats

cal.R2 <- round(pls::R2(plsr.out)[[1]][nComps],2)

this needs to be

cal.R2 <- round(pls::R2(plsr.out)[[1]][nComps+1],2)
serbinsh commented 3 years ago

Actually the most appropriate way is to use

pls::R2(plsr.out,newdata=val.plsr.data, intercept = F)

intercept =F

JulienLamour commented 3 years ago

I ll check if this is the issue I had, I finally used the lm function to compute the R2 and RMSE and thought to check what was the difference later. I also had negative R2 using pls::R2 which makes no sense to me, so I wonder if it is an issue with pls:R2 or if we did something wrong. See for example: https://github.com/TESTgroup-BNL/Physiological_traits_PLSR_models/blob/master/PLSR_models/Validation_PuertoRico2017%202021-04-20.pdf . It looks like I had issue when the performance of the model was not very good (for example R2 with lm below 0.6) and a lot of components. We do have an intercept in our models so I am not sure if we need to use nComps+1? Ill check later today or tommorow

serbinsh commented 3 years ago

Developing a small PR to address this issue.