Closed cregouby closed 2 years ago
I think mold()
is doing the right thing. The ptypes
argument in question must correspond to the ptype of the original data supplied to mold()
. It is doing that for both cases.
suppressPackageStartupMessages(library(tidymodels))
data("lending_club", package = "modeldata")
rec_unsup <- recipe(Class ~ ., lending_club) %>% step_normalize(all_numeric())
prep_unsup <- rec_unsup %>% prep()
#### mold.data.frame() method
unsupervised_baked_df <- prep_unsup %>% bake(new_data=NULL)
# original data is numeric because of step_normalize() being applied already
class(unsupervised_baked_df$open_il_6m)
#> [1] "numeric"
processed.data.frame <- hardhat::mold(x=unsupervised_baked_df[,-23], y=unsupervised_baked_df[,23])
# ptype is consistent with that:
class(processed.data.frame$blueprint$ptypes$predictors$open_il_6m)
#> [1] "numeric"
#### mold.recipe() method
# original data is integer
class(lending_club$open_il_6m)
#> [1] "integer"
unprepared.recipe <- hardhat::mold(rec_unsup, lending_club)
# ptype is consistent with that:
class(unprepared.recipe$blueprint$ptypes$predictors$open_il_6m)
#> [1] "integer"
Created on 2021-10-25 by the reprex package (v2.0.1)
I think the problem in your original issue is that you are mixing the XY method of tabnet_pretrain()
with the recipe method of tabnet_fit()
.
You should either use the recipe interfaces for both, or the XY interfaces for both, but not mixed
Hello @DavisVaughan . You are perfectly right. I had forgot the evident assumption that normalization turns integers into numeric. Sorry for that.
And thanks for the hint on my own issue !
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
The problem
I'm having trouble with mold.recipe output $blueprint$ptypes$predictors that turns numerical double values of a tibble into
integer
class. Note thatmold.data.frame()
output $blueprint$ptypes$predictors for the same dataset correctly record them asdouble
.Impact
using those
ptypes
for dataset validation in between model pretraining and model training makes the validation fails such as in https://github.com/mlverse/tabnet/issues/66.Reproducible example
Created on 2021-10-22 by the reprex package (v2.0.1)