Closed prockenschaub closed 4 years ago
Same here, would also like to point out that switching to logistic_reg()
above (e.g. by predicting factor(am)
) or step_modeimpute()
will not help.
Also tried:
svm_mod <-
svm_rbf(mode = "regression", cost = tune(), rbf_sigma = tune()) %>%
set_engine("kernlab")
instead of lasso_mod
above, getting the same warning.
> sessioninfo::session_info()
- Session info ----------------------------------------------------------------------
setting value
version R version 3.6.1 (2019-07-05)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_Israel.1252
ctype English_Israel.1252
tz Asia/Jerusalem
date 2020-05-01
- Packages --------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
backports 1.1.6 2020-04-05 [1] CRAN (R 3.6.3)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.6.0)
bayesplot 1.7.1 2019-12-01 [1] CRAN (R 3.6.1)
BBmisc 1.11 2017-03-10 [1] CRAN (R 3.6.3)
boot 1.3-22 2019-04-02 [2] CRAN (R 3.6.1)
broom * 0.5.2 2019-04-07 [1] CRAN (R 3.6.1)
callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.3)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.1)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 3.6.3)
class 7.3-15 2019-01-01 [2] CRAN (R 3.6.1)
cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3)
codetools 0.2-16 2018-12-24 [2] CRAN (R 3.6.1)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.1)
colourpicker 1.0 2017-09-27 [1] CRAN (R 3.6.1)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
crosstalk 1.0.0 2016-12-21 [1] CRAN (R 3.6.1)
data.table 1.12.8 2019-12-09 [1] CRAN (R 3.6.3)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1)
dials * 0.0.6 2020-04-03 [1] CRAN (R 3.6.1)
DiceDesign 1.8-1 2019-07-31 [1] CRAN (R 3.6.1)
digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.3)
doParallel 1.0.15 2019-08-02 [1] CRAN (R 3.6.3)
dplyr * 0.8.5 2020-03-07 [1] CRAN (R 3.6.3)
DT 0.7 2019-06-11 [1] CRAN (R 3.6.1)
dygraphs 1.1.1.6 2018-07-11 [1] CRAN (R 3.6.1)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
embed * 0.0.6 2020-03-17 [1] CRAN (R 3.6.3)
emo 0.0.0.9000 2019-11-04 [1] Github (hadley/emo@02a5206)
evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1)
fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.3)
farver 2.0.3 2020-01-16 [1] CRAN (R 3.6.3)
fastmatch 1.1-0 2017-01-28 [1] CRAN (R 3.6.0)
float 0.2-3 2019-05-31 [1] CRAN (R 3.6.0)
FNN 1.1.3 2019-02-15 [1] CRAN (R 3.6.3)
forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.6.1)
foreach 1.5.0 2020-03-30 [1] CRAN (R 3.6.3)
furrr 0.1.0 2018-05-16 [1] CRAN (R 3.6.1)
future 1.14.0 2019-07-02 [1] CRAN (R 3.6.1)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.1)
ggmosaic * 0.2.0 2018-09-12 [1] CRAN (R 3.6.1)
ggplot2 * 3.3.0 2020-03-05 [1] CRAN (R 3.6.3)
ggrepel 0.8.1 2019-05-07 [1] CRAN (R 3.6.1)
ggridges 0.5.1 2018-09-27 [1] CRAN (R 3.6.1)
glmnet 3.0-1 2019-11-15 [1] CRAN (R 3.6.1)
globals 0.12.4 2018-10-11 [1] CRAN (R 3.6.0)
glue * 1.4.0 2020-04-03 [1] CRAN (R 3.6.3)
gower 0.2.1 2019-05-14 [1] CRAN (R 3.6.0)
GPfit 1.0-8 2019-02-08 [1] CRAN (R 3.6.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 3.6.1)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.1)
gtools 3.8.1 2018-06-26 [1] CRAN (R 3.6.0)
hardhat 0.1.1 2020-01-08 [1] CRAN (R 3.6.2)
haven 2.1.1 2019-07-04 [1] CRAN (R 3.6.1)
hms 0.5.2 2019-10-30 [1] CRAN (R 3.6.1)
htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.1)
htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.6.1)
httpuv 1.5.1 2019-04-05 [1] CRAN (R 3.6.1)
httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.1)
igraph 1.2.4.1 2019-04-22 [1] CRAN (R 3.6.1)
infer * 0.5.0 2019-09-27 [1] CRAN (R 3.6.1)
inline 0.3.15 2018-05-18 [1] CRAN (R 3.6.1)
ipred 0.9-9 2019-04-28 [1] CRAN (R 3.6.1)
iterators 1.0.12 2019-07-26 [1] CRAN (R 3.6.1)
janeaustenr 0.1.5 2017-06-10 [1] CRAN (R 3.6.1)
jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.1)
keras 2.2.4.1.9001 2019-09-10 [1] Github (rstudio/keras@95ea0b5)
kernlab 0.9-27 2018-08-10 [1] CRAN (R 3.6.0)
knitr 1.23 2019-05-18 [1] CRAN (R 3.6.1)
labeling 0.3 2014-08-23 [1] CRAN (R 3.6.0)
later 1.0.0 2019-10-04 [1] CRAN (R 3.6.1)
lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.1)
lava 1.6.7 2020-03-05 [1] CRAN (R 3.6.3)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.1)
lgr 0.3.4 2020-03-20 [1] CRAN (R 3.6.3)
lhs 1.0.1 2019-02-03 [1] CRAN (R 3.6.2)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3)
listenv 0.7.0 2018-01-21 [1] CRAN (R 3.6.1)
lme4 1.1-21 2019-03-05 [1] CRAN (R 3.6.1)
loo 2.1.0 2019-03-13 [1] CRAN (R 3.6.1)
lubridate 1.7.8 2020-04-06 [1] CRAN (R 3.6.3)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
markdown 1.0 2019-06-07 [1] CRAN (R 3.6.1)
MASS 7.3-51.4 2019-03-31 [2] CRAN (R 3.6.1)
Matrix 1.2-17 2019-03-22 [2] CRAN (R 3.6.1)
matrixStats 0.55.0 2019-09-07 [1] CRAN (R 3.6.1)
mgcv 1.8-28 2019-03-21 [2] CRAN (R 3.6.1)
mime 0.7 2019-06-11 [1] CRAN (R 3.6.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 3.6.1)
minqa 1.2.4 2014-10-09 [1] CRAN (R 3.6.1)
mlapi 0.1.0 2017-12-17 [1] CRAN (R 3.6.3)
mlr 2.17.1 2020-03-24 [1] CRAN (R 3.6.3)
modelr 0.1.5 2019-08-08 [1] CRAN (R 3.6.1)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.1)
naniar * 0.4.2 2019-02-15 [1] CRAN (R 3.6.2)
nlme 3.1-140 2019-05-12 [2] CRAN (R 3.6.1)
nloptr 1.2.1 2018-10-03 [1] CRAN (R 3.6.1)
nnet 7.3-12 2016-02-02 [2] CRAN (R 3.6.1)
packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.1)
parallelMap 1.5.0 2020-03-26 [1] CRAN (R 3.6.3)
ParamHelpers 1.14 2020-03-24 [1] CRAN (R 3.6.3)
parsnip * 0.0.4 2019-11-02 [1] CRAN (R 3.6.1)
pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.3)
pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.3)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.1)
plotly 4.9.0 2019-04-10 [1] CRAN (R 3.6.1)
plyr 1.8.4 2016-06-08 [1] CRAN (R 3.6.1)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.3)
pROC 1.15.3 2019-07-21 [1] CRAN (R 3.6.1)
processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.3)
prodlim 2019.11.13 2019-11-17 [1] CRAN (R 3.6.3)
productplots 0.1.1 2016-07-02 [1] CRAN (R 3.6.1)
promises 1.0.1 2018-04-13 [1] CRAN (R 3.6.1)
ps 1.3.2 2020-02-13 [1] CRAN (R 3.6.3)
purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.1)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
RANN 2.6.1 2019-01-08 [1] CRAN (R 3.6.3)
Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 3.6.3)
readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.1)
readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.1)
recipes * 0.1.10 2020-03-18 [1] CRAN (R 3.6.3)
reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.6.1)
reticulate 1.13.0-9000 2019-09-10 [1] Github (rstudio/reticulate@f17091b)
RhpcBLASctl 0.20-17 2020-01-17 [1] CRAN (R 3.6.2)
rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.3)
rmarkdown 1.14 2019-07-12 [1] CRAN (R 3.6.1)
ROSE 0.0-3 2014-07-15 [1] CRAN (R 3.6.3)
rpart 4.1-15 2019-04-12 [2] CRAN (R 3.6.1)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1)
rsample * 0.0.5 2019-07-12 [1] CRAN (R 3.6.1)
rsconnect 0.8.15 2019-07-22 [1] CRAN (R 3.6.1)
rsparse 0.4.0 2020-04-01 [1] CRAN (R 3.6.3)
rstan 2.19.2 2019-07-09 [1] CRAN (R 3.6.1)
rstanarm 2.19.2 2019-10-03 [1] CRAN (R 3.6.1)
rstantools 2.0.0 2019-09-15 [1] CRAN (R 3.6.1)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.3)
rvest 0.3.4 2019-05-15 [1] CRAN (R 3.6.1)
scales * 1.1.0 2019-11-18 [1] CRAN (R 3.6.3)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
shape 1.4.4 2018-02-07 [1] CRAN (R 3.6.0)
shiny 1.3.2 2019-04-22 [1] CRAN (R 3.6.1)
shinyjs 1.0 2018-01-08 [1] CRAN (R 3.6.1)
shinystan 2.5.0 2018-05-01 [1] CRAN (R 3.6.1)
shinythemes 1.1.2 2018-11-06 [1] CRAN (R 3.6.1)
SnowballC 0.6.0 2019-01-15 [1] CRAN (R 3.6.0)
StanHeaders 2.19.0 2019-09-07 [1] CRAN (R 3.6.1)
stopwords 1.0 2019-07-24 [1] CRAN (R 3.6.1)
stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
survival 2.44-1.1 2019-04-01 [2] CRAN (R 3.6.1)
tensorflow 1.14.0.9000 2019-09-10 [1] Github (rstudio/tensorflow@5185c97)
testthat 2.3.2 2020-03-02 [1] CRAN (R 3.6.3)
text2vec 0.6 2020-02-18 [1] CRAN (R 3.6.3)
textfeatures 0.3.3 2019-09-03 [1] CRAN (R 3.6.3)
textrecipes * 0.2.0 2020-04-14 [1] CRAN (R 3.6.3)
tfruns 1.4 2018-08-25 [1] CRAN (R 3.6.1)
themis * 0.1.0 2020-01-13 [1] CRAN (R 3.6.3)
threejs 0.3.1 2017-08-13 [1] CRAN (R 3.6.1)
tibble * 3.0.0 2020-03-30 [1] CRAN (R 3.6.3)
tidymodels * 0.0.3 2019-10-04 [1] CRAN (R 3.6.1)
tidyposterior 0.0.2 2018-11-15 [1] CRAN (R 3.6.1)
tidypredict 0.4.3 2019-09-03 [1] CRAN (R 3.6.1)
tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.6.3)
tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.3)
tidytext * 0.2.2 2019-07-29 [1] CRAN (R 3.6.1)
tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.6.1)
timeDate 3043.102 2018-02-21 [1] CRAN (R 3.6.0)
tokenizers 0.2.1 2018-03-29 [1] CRAN (R 3.6.1)
tune * 0.0.1 2020-02-11 [1] CRAN (R 3.6.1)
unbalanced 2.0 2015-06-26 [1] CRAN (R 3.6.3)
utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.1)
uwot 0.1.8 2020-03-16 [1] CRAN (R 3.6.3)
vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.3)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.6.1)
visdat 0.5.3 2019-02-15 [1] CRAN (R 3.6.2)
whisker 0.4 2019-08-28 [1] CRAN (R 3.6.1)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1)
workflows 0.1.0 2019-12-30 [1] CRAN (R 3.6.2)
xaringan 0.13 2019-10-30 [1] CRAN (R 3.6.1)
xfun 0.8 2019-06-25 [1] CRAN (R 3.6.1)
xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.1)
xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.2)
xts 0.11-2 2018-11-05 [1] CRAN (R 3.6.1)
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
yardstick * 0.0.4 2019-08-26 [1] CRAN (R 3.6.1)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.1)
zoo 1.8-6 2019-05-28 [1] CRAN (R 3.6.1)
[1] C:/Users/gsimc/Documents/R/win-library/3.6
[2] C:/Program Files/R/R-3.6.1/library
This does seem to be either a workflows
or hardhat
issue:
library(tidymodels)
#> ── Attaching packages ───────────────────────────────────────────────────────────── tidymodels 0.1.0 ──
#> ✓ broom 0.5.4 ✓ recipes 0.1.12
#> ✓ dials 0.0.6 ✓ rsample 0.0.6
#> ✓ dplyr 0.8.5 ✓ tibble 3.0.1
#> ✓ ggplot2 3.3.0 ✓ tune 0.1.0
#> ✓ infer 0.5.1 ✓ workflows 0.1.0
#> ✓ parsnip 0.1.0 ✓ yardstick 0.0.5
#> ✓ purrr 0.3.4
#> Warning: package 'parsnip' was built under R version 3.6.2
#> Warning: package 'rsample' was built under R version 3.6.2
#> Warning: package 'tibble' was built under R version 3.6.2
#> ── Conflicts ──────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x recipes::step() masks stats::step()
set.seed(1234)
mtcars_tb <- mtcars %>%
as_tibble() %>%
mutate(vs = factor(c(sample(vs, 22), rep(NA_integer_, 10))))
set.seed(1234)
cv_fold_mtc <- vfold_cv(mtcars_tb, v = 2)
# as an example
split <- cv_fold_mtc$splits[[1]]
lasso_mod <-
linear_reg(penalty = .01, mixture = 1) %>%
set_engine("glmnet")
rec <- recipe(mpg ~ disp + vs, data = analysis(split)) %>%
step_unknown(all_nominal()) %>%
step_dummy(all_nominal())
rec_fit <- rec %>% prep()
model_fit <- lasso_mod %>% fit(mpg ~ ., data = juice(rec_fit))
model_pred <- predict(model_fit, bake(rec_fit, assessment(split)))
wflow <-
workflow() %>%
add_model(lasso_mod) %>%
add_recipe(rec)
wflow_fit <- wflow %>% fit(data = analysis(split))
wflow_pred <- predict(wflow_fit, assessment(split))
#> Warning: Novel levels found in column 'vs': NA. The levels have been removed,
#> and values have been coerced to 'NA'.
Created on 2020-05-01 by the reprex package (v0.3.0)
I think it is more likely that this is a hardhat issue than workflows. Likely I'm not accounting for NA
as being ok somewhere in scream()
Minimal reprex
library(hardhat)
library(vctrs)
df <- data.frame(x = factor(c("x", NA)))
ptype <- vec_ptype(df)
scream(df, ptype = ptype)
#> Warning: Novel levels found in column 'x': NA. The levels have been removed, and
#> values have been coerced to 'NA'.
#> x
#> 1 x
#> 2 <NA>
Created on 2020-05-01 by the reprex package (v0.3.0)
The main problem is that check_novel_levels.factor()
is using unique(x)
to get the levels, when it should be using levels(x)
. unique(x)
will pull in NA
as a level. Making this change will also require check_novel_levels.character()
, which currently uses the same path as for factors. The merged code path is the reason I tried to use unique()
in the first place
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
The problem
I have data that has missing values in a factor variable. I am dealing with these by using
recipes::step_unknown(. , all_nominal())
. When I am runningtune_grid
in this setting, it results in warnings about novel levels during the predictionI tracked down the warning to the
scream
function in the hardhat package, and it seems that everything works fine despite the warning:Does this belong here (because it should provide
scream
with a different parameter) or to hardhat's issues (because it is an error in howscream
works)?Reproducible example (with tune)