roberthyde / stabiliser

Other
0 stars 0 forks source link

Bootstrap failure with small datasets #4

Open martin-green-1000 opened 2 weeks ago

martin-green-1000 commented 2 weeks ago

Bootstrapping causes model sto fail if contain all 0s or 1s

Proposed solution - break loop and repeat boot sample

roberthyde commented 2 weeks ago

Testing code for convenience: ` library(rsample) library(tidyverse) nrow=10 simulated_data <- data.frame( y = sample(c("YES", "NO"), size = nrow, replace = TRUE, prob = c(0.1, 0.9)), var1 = rnorm(nrow) )

simulated_data

boot_data <- bootstraps(simulated_data, times = 10)

boot_data %>% map(.x = .$splits, .f = ~ as.data.frame(.))

rebootstrap_if_needed <- function(df){ number_rows <- as.data.frame(df) %>% count(y) %>% nrow()

if(number_rows < 2){ print("We need a new bootstrap") new_bootstrap = bootstraps(simulated_data, times = 1) df = new_bootstrap } }

boot_data %>% map(.x = .$splits, .f = ~rebootstrap_if_needed (.) ) `