rvalavi / blockCV

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See
https://doi.org/10.1111/2041-210X.13107
GNU General Public License v3.0
107 stars 24 forks source link

NA value appearing in dataframe #45

Closed cwberry77 closed 8 months ago

cwberry77 commented 8 months ago

Hi there,

I'll premise this issue with the fact that I'm relatively new to R so are therefore relatively sure I'm making an amatuer mistake.

I'm trying to use cv_spatial to cross validate my rf model. When the folds indexing are applied to my dataframe, I end up with less entries than in the initial trainSet or TestSet. It seems a number of NA values are appearing in this subset of my original dataframe. I'm not sure where these are coming from as my data frame contains no NA values (these have previously been omitted. This is giving me the persistent error of

Warning messages: 1: In test_table$preds[testSet] <- predict(rf_model, newdata = test_df_subset, : number of items to replace is not a multiple of replacement length 2: In test_table$preds[testSet] <- predict(rf_model, newdata = test_df_subset, : number of items to replace is not a multiple of replacement length

I would greatly appreciate any guidance you could offer on this issue. As I said I'm fairly sure I'm doing something fundamentally wrong.

Thanks,

Chris

rvalavi commented 8 months ago

Hi @cwberry77, no problem.

I can't say much without seeing your cross-validation code!

Check whether the length(testSet) and nrow(test_df_subset) are the same or not. If they are not, you are probably making a mistake in filtering the original dataframe to the test dataframe.

cwberry77 commented 8 months ago

Hi @rvalavi,

Thanks very much for the prompt reply!

The lengths of the testSet and test_df_subset are different following the removal of the NA values which pop up in the subset. I've added my code below.

blockCV_20240129.txt

cwberry77 commented 8 months ago

Attached here is also a subset of my data if that helps

Thanks again @rvalavi !

[Uploading BlockCV_Subset.zip…]()

rvalavi commented 8 months ago

You have two objects that is not obvious how they are created: 1) sf_cropped is used for cv_spatial and then sf from line 17 is used for test_table 2) train_df_NA is also not obviouse where is coming from.

I think the main issue is that you remove NAs during cross-validation in lines 71, and 72. This should be done before creating folds in cv_spatial.

cwberry77 commented 8 months ago

Hi @rvalavi,

I have added updated code for clarity.

I have tried removing NA's from the rasters prior to performing the cv_spatial function but still seem to be encountering the same issue?

blockCV_20240130.txt

rvalavi commented 8 months ago

I'm closing this as the problem was solved over email and the issue was not about the blockCV package.