rvalavi / blockCV

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See
https://doi.org/10.1111/2041-210X.13107
GNU General Public License v3.0
109 stars 24 forks source link

ERROR : Replacement has X rows, data has Y #19

Closed Moncef-Boukhecheba closed 3 years ago

Moncef-Boukhecheba commented 3 years ago

Hello, thanks for the great package,

I am new to R and species distribution modeling, and I'm encountering a problem with the package :

When I load the species data (with the environmental variables) and try to do the cross validation, I get the error Error in '[[<-.data.frame'('*tmp*', i, value = c(4, 2, 4, 5, 3, 4, 2, : replacement has 1059 rows, data has 1070.

This is the code I use for the cross validation :

folds <- spatialBlock(speciesData = species_data_final, 
                       species = "Present", 
                       theRange = 500, 
                       k = 5,
                       showBlocks = TRUE, 
                       verbose = TRUE,
                       selection = "random", 
                       iteration = 100, 
                       biomod2Format = TRUE)

And this is a preview of the data :

Present | bio_1 | bio_10 | bio_11 | bio_12 | bio_13 | bio_14 | bio_15 | bio_16 | bio_17 | bio_18 | bio_19 | bio_2 | bio_3 | bio_4 | bio_5 | bio_6 | bio_7 | bio_8 | bio_9
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
1 | 6.695098 | 20.37255 | -8.242157 | 991.5882 | 95.17647 | 61.05882 | 12.43980 | 277.9412 | 205.0588 | 271.2353 | 217.8235 | 9.810784 | 23.30634 | 1158.821 | 27.25294 | -14.84118 | 42.09412 | 19.31961 | -7.000980
1 | 6.638988 | 20.33214 | -8.322619 | 999.5714 | 96.92857 | 61.21429 | 12.78637 | 279.0000 | 204.7857 | 271.7857 | 218.2143 | 9.891071 | 23.39894 | 1161.128 | 27.26429 | -15.00714 | 42.27143 | 19.30357 | -7.127381
1 | 6.733333 | 20.43958 | -8.232292 | 992.7500 | 95.18750 | 60.93750 | 12.61262 | 278.2500 | 204.3750 | 272.1250 | 217.5000 | 9.878125 | 23.39039 | 1161.155 | 27.36875 | -14.86250 | 42.23125 | 19.37917 | -6.981250
1 | 6.670833 | 20.34889 | -8.267778 | 992.7333 | 95.20000 | 61.13333 | 12.34116 | 277.8667 | 205.8000 | 271.4000 | 218.0000 | 9.809444 | 23.30723 | 1158.956 | 27.22000 | -14.86667 | 42.08667 | 19.29778 | -7.033333
1 | 6.666667 | 20.35000 | -8.283333 | 1003.0000 | 99.00000 | 62.00000 | 13.06301 | 283.0000 | 205.0000 | 274.0000 | 220.0000 | 9.916667 | 23.49921 | 1160.110 | 27.30000 | -14.90000 | 42.20000 | 19.31667 | -7.050000
1 | 6.758333 | 20.46667 | -8.183333 | 987.0000 | 95.00000 | 61.00000 | 12.46353 | 277.0000 | 204.0000 | 269.0000 | 218.0000 | 9.883333 | 23.42022 | 1160.053 | 27.40000 | -14.80000 | 42.20000 | 19.40000 | -6.916667

The necessary code and data to replicate this issue can be found here https://github.com/Moncef-Boukhecheba/Blockcv-error-example.

I hope you can help me with this problem and find what i am doing wrong.

Thank you.

rvalavi commented 3 years ago

@Moncef-Boukhecheba the blocks you specify are too small! Notice the size is in meters! Try something like theRange = 5000

AMBarbosa commented 2 years ago

Hi, I also get this same error in some runs (with some random seeds) of 'spatialBlock', even with a sensible block size that does work with other random seeds. E.g., with the same code I posted in this other issue, but changing the random seed near the end:

set.seed(1)
blocks <- spatialBlock(speciesData = as(occdata, "Spatial"), theRange = 200000, k = 5)
# "Error in `[[<-.data.frame`(`*tmp*`, i, value = c(5, 1, 5, 1, 2, 3, 2,  : replacement has 85 rows, data has 87"