Bug: need to generalize assumptions in base R data split

serbinsh commented 4 years ago

@neo0351

FYI -

> apply(plsr_data[, split_var], MARGIN = 1, FUN = function(x) paste(x, collapse = " "))
Error in apply(plsr_data[, split_var], MARGIN = 1, FUN = function(x) paste(x,  : 
  dim(X) must have a positive length

I found a bug in your split function when trying out a new dataset https://ecosis.org/package/leaf-reflectance-plant-functional-gradient-ifgg-kit ID: 3cf6b27e-d80e-4bc7-b214-c95506e46daa

Not yet sure what the issue is but it looks at as coded it assumes there will be two grouping variables. We need this to be flexible enough to handle 1+ grouping variables

And also FYI - if I try with two grouping vars I get this

> split_data <- create_data_split(approach=method, split_seed=2356812, prop=0.8, 
+                                 group_variables=c("Growth_Form","Plant_Species"))
NA   Cal: 79.9729364005413%
 Not enough observations

Again these functions need to be general to allow for flexibility. We will need to fix this to allow for different numbers of grouping variables with different numbers of obs.

serbinsh commented 4 years ago

Pretty sure this is the area causing problems

create_data_split <- function(approach=NULL, split_seed=123456789, prop=0.8,
                              group_variables=NULL) {
  set.seed(split_seed)
  if(!is.null(approach)) {
    if (approach=="base") {
      plsr_data$CalVal <- NA
      split_var <- group_variables
      plsr_data$ID <- apply(plsr_data[, split_var], MARGIN = 1, FUN = function(x) paste(x, collapse = " "))