Closed Baldl closed 1 year ago
Hi Lisa, thanks for the report and your interest in using blockCV. I need to check this in detail. I'll let update you with the results soon.
Thank you very much!
Hi @Baldl
Thanks again for the report. That was actually a bug that I fixed now. It was very hard to find but with an easy fix.
Please update to blockCV v3.0.3
and check again.
library(ggplot2)
ggplot() +
geom_sf(data = data, aes(col = as.factor(occ), alpha = 1 / (occ + 1))) +
geom_sf(data = blocks$blocks, fill = NA) +
geom_sf_text(data = blocks$blocks, aes(label = folds))
dplyr::n_distinct(data[data$folds ==1, ]$occ)
dplyr::n_distinct(data[data$folds ==2, ]$occ)
dplyr::n_distinct(data[data$folds ==3, ]$occ)
dplyr::n_distinct(data[data$folds ==4, ]$occ)
dplyr::n_distinct(data[data$folds ==5, ]$occ)
dplyr::n_distinct(data[data$folds ==6, ]$occ)
dplyr::n_distinct(data[data$folds ==7, ]$occ)
Please let me know if there are any other issues.
I'm closing this issue.
Hi,
first of all, thank you for this amazing package and the really nice update!
I´m encountering some weird behavior when I´m using the function cv_spatial. If I pass more than one class in the "column" argument and one or more of the classes is more clustered in space than the other class(es). For example, the column consists of three values 0,1,2.
The function returns the training and test table stating that each fold contains at least one data record. However, if I have a look at the folds_ids in the end this is not true and less folds than reported have been created for the smaller class(es).
Here is a reproducible example:
library(sf)
sf_1.0-9
library(blockCV)
blockCV_3.0-2
set.seed(123)
presence <- sf::st_as_sf(data.frame( occ = 1, x = runif(100, -75.4, -74), y = runif(100, 39.6, 41)), coords = c("x", "y"), crs = "EPSG:4326" )
absence <- sf::st_as_sf(data.frame( occ = 0, x = runif(100, -75.4, -74), y = runif(100, 39.6, 41)), coords = c("x", "y"), crs = "EPSG:4326" )
background <- sf::st_as_sf(data.frame( occ = 2, x = runif(10000, -80.4, -74), y = runif(10000, 39.6, 41)), coords = c("x", "y"), crs = "EPSG:4326" ) data=rbind(presence, absence, background);rm(presence, absence, background)
blocks <- blockCV::cv_spatial( x = data, column="occ", k = 7L, size=70000 )
data$folds<- blocks$folds_ids dplyr::n_distinct(data[data$occ==0,]$folds) dplyr::n_distinct(data[data$occ==1,]$folds) dplyr::n_distinct(data[data$occ==2,]$folds)
sessionInfo()
SessionInfo: R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 [4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] blockCV_3.0-2 sf_1.0-9
loaded via a namespace (and not attached): [1] Rcpp_1.0.10 rstudioapi_0.14 magrittr_2.0.3 units_0.8-1 munsell_0.5.0 tidyselect_1.2.0
[7] colorspace_2.1-0 R6_2.5.1 rlang_1.0.6 fansi_1.0.4 s2_1.1.2 dplyr_1.1.0
[13] wk_0.7.1 tools_4.2.2 grid_4.2.2 gtable_0.3.1 KernSmooth_2.23-20 utf8_1.2.3
[19] cli_3.6.0 e1071_1.7-13 DBI_1.1.3 withr_2.5.0 class_7.3-20 tibble_3.1.8
[25] lifecycle_1.0.3 farver_2.1.1 ggplot2_3.4.1 vctrs_0.5.2 glue_1.6.2 proxy_0.4-27
[31] compiler_4.2.2 pillar_1.8.1 scales_1.2.1 generics_0.1.3 classInt_0.4-9 pkgconfig_2.0.3
If I´m using the function on just one of the datasets (e.g. class 0,1 OR 3) it works fine and all data records are assigned to a fold, it only occurs when I´m passing all of them to the function.
I´m not sure if I´m just using the function wrong or if it is an actual issue, however some feedback from you would be much appreciated.
I hope my problem is understandable to you, please let me know if you need some clarification.
Best, Lisa