topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 634 forks source link

test "Good names for dummies with reocurring patterns" fails sometimes #1345

Open MichaelChirico opened 1 year ago

MichaelChirico commented 1 year ago

Our CI sees the caret suite fail occasionally. I think the test needs to set a seed, or otherwise the code needs to be designed to be robust to random input (at a glance, I am guessing the former).

https://github.com/topepo/caret/blob/5f4bd2069bf486ae92240979f9d65b5c138ca8d4/pkg/caret/tests/testthat/test_Dummies.R#L122-L139

replicate(1e4, {
test_that("Good names for dummies with reocurring patterns", {
  data = data.frame(
    matrix(
      rep(
        as.factor(sample.int(15, size = 100, replace = TRUE, prob = rep(1 / 15, 15))
        ),
        15
      ),
      ncol = 15
    ),
    stringsAsFactors = TRUE
  )
  essai_dummyVars = caret::dummyVars(stats::as.formula(paste0("~ ", colnames(data), collapse = "+")), data)

  exp_names_lvls <- apply(expand.grid(paste0("X",1:15), paste0(".",1:15)), 1, paste, collapse="")
  res_names_lvls <- colnames(predict(essai_dummyVars, data))
  expect_true(all(exp_names_lvls %in% res_names_lvls))
})
})

The failure looks like it happens <1% of the time, but the 1e4 replications I used above seems to trigger the issue consistently.