Closed serbinsh closed 4 years ago
Pretty sure this is the area causing problems
create_data_split <- function(approach=NULL, split_seed=123456789, prop=0.8,
group_variables=NULL) {
set.seed(split_seed)
if(!is.null(approach)) {
if (approach=="base") {
plsr_data$CalVal <- NA
split_var <- group_variables
plsr_data$ID <- apply(plsr_data[, split_var], MARGIN = 1, FUN = function(x) paste(x, collapse = " "))
can you email me the data set you are using? I agree, this needs to work on any data set.
I provided the EcoSIS link to the dataset? Do you actually need me to email it to you?
Oh, sorry, missed that you linked another data set
@serbinsh Ok, I see the problem. It's the grep looking for NA's. Which will cause issues. I've solved this once upon a time and don't remember the fix right now. The real question is, how do we want to handle say species that has NA's in it? Drop them? Leave them?
@serbinsh What do we want to do with NA's in the grouping variables?
We dont want to drop NA's. I think we may just have to shunt them as another "factor" that is pool all NA's together. What do you think? The risk is dropping NAs could mean we drop a lot of good data for training
Good point. I'll keep them in. We should check Julien's method and see what it does.
I'll get you the fix after lunch. I think I need to update all the things before submitting the fix.
Fixed by @neo0351
@neo0351
FYI -
I found a bug in your split function when trying out a new dataset https://ecosis.org/package/leaf-reflectance-plant-functional-gradient-ifgg-kit ID: 3cf6b27e-d80e-4bc7-b214-c95506e46daa
Not yet sure what the issue is but it looks at as coded it assumes there will be two grouping variables. We need this to be flexible enough to handle 1+ grouping variables
And also FYI - if I try with two grouping vars I get this
Again these functions need to be general to allow for flexibility. We will need to fix this to allow for different numbers of grouping variables with different numbers of obs.