topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 634 forks source link

Flaky test on dummyVars #1350

Open MichaelChirico opened 11 months ago

MichaelChirico commented 11 months ago

This test is flaky:

https://github.com/topepo/caret/blob/5f4bd2069bf486ae92240979f9d65b5c138ca8d4/pkg/caret/tests/testthat/test_Dummies.R#L122-L139C3

It fails whenever some entry from 1:15 is missing from sample.int(15, size = 100, replace = TRUE, prob = rep(1 / 15, 15)).

That happens about (probably exactly? too lazy to do the math) 1.5% of the time:

mean(replicate(1e6, all(1:15 %in% sample.int(15, size = 100, replace = TRUE, prob = rep(1 / 15, 15)))))
# [1] [1] 0.984922

Observe:

# get an entry missing one of 1:15
repeat {
  entry <- sample.int(15, size = 100, replace = TRUE, prob = rep(1 / 15, 15))
  if (!all(1:15 %in% entry)) break
}

# now finish the test
data = data.frame(matrix(rep(as.factor(entry), 15), ncol = 15), stringsAsFactors = TRUE)
essai_dummyVars = caret::dummyVars(stats::as.formula(paste0("~ ", colnames(data), collapse = "+")), data)

exp_names_lvls <- apply(expand.grid(paste0("X",1:15), paste0(".",1:15)), 1, paste, collapse="")
res_names_lvls <- colnames(predict(essai_dummyVars, data))
all(exp_names_lvls %in% res_names_lvls)
# [1] FALSE