tidymodels / hardhat

Construct Modeling Packages
https://hardhat.tidymodels.org
Other
101 stars 15 forks source link

Avoid passing factors/characters to `model.matrix()` altogether when `indicators = "none"` #234

Closed DavisVaughan closed 1 year ago

DavisVaughan commented 1 year ago

Closes #213

This is complicated and a bit gross, but to avoid having model.matrix() process our character columns at all, we have to convert them to some other type, like a constant integer column. This avoids having contrasts<- run at all. In theory it shouldn't really be run because the formula is like ~ + col - col so col isn't actually utilized in the end, but that is a little muddy.

In an ideal world we'd process the formula in a way that strips out col entirely and pass that on to model.matrix(), so then we could just remove the factorish columns entirely from the data that we pass to model.matrix(), but that is incredibly difficult to do in a robust way given the number of things you can do in the formula, so instead we are stuck with tacking on - col and converting to a constant integer column to get as close to a no-op as we possibly can.


The other long term solution would be deciding that indicators = "none" is a bad idea for the formula interface, and instead encouraging people to use the variables interface instead if they want that.

github-actions[bot] commented 1 year ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.