plantphys / spectratrait

A tutorial R package for illustrating how to fit, evaluate, and report spectra-trait PLSR models. The package provides functions to enhance the base functionality of the R pls package, identify an optimal number of PLSR components, standardize model validation, and vignette examples that utilize datasets sourced from EcoSIS (ecosis.org)
GNU General Public License v3.0
11 stars 9 forks source link

Bug: dplyr data split approach assumptions #26

Closed serbinsh closed 4 years ago

serbinsh commented 4 years ago

FYI @JulienLamour when playing with another dataset from EcoSIS I realized that your dplyr method for splitting data assumes the dataset contains "Sample_ID" which is based on the original example but not universal

    } else if (approach=="dplyr")
      cal.plsr.data <- plsr_data %>% 
        group_by_at(vars(all_of(group_variables))) %>% 
        slice(sample(1:n(), prop*n())) %>% 
        data.frame()
      val.plsr.data <- plsr_data[!plsr_data$Sample_ID %in% cal.plsr.data$Sample_ID,]

We will need to revise how we select out the not cal data so it doesnt depend on knowing what variable is in the dataset. currently I am not sure yet how do do that with that approach as it doesnt create a new variable to track which rows are part of the cal selected dataset, like the original version as revised by @neo0351 . We will need to fix this to make it more generic

JulienLamour commented 4 years ago

val.plsr.data <-plsr_data[!row.names(plsr_data) %in% row.names(cal.plsr.data),] should do the trick

serbinsh commented 4 years ago

@JulienLamour yup looks like that worked for this other test dataset. Also just checked the original example and it seems to be creating the same data split as before. So I think that solved it, thanks!