mlr-org / mlr3pipelines

Dataflow Programming for Machine Learning in R
https://mlr3pipelines.mlr-org.com/
GNU Lesser General Public License v3.0
140 stars 25 forks source link

Should PipeOpEncode also work on character? #555

Closed pfistfl closed 3 years ago

mb706 commented 3 years ago

I don't think we should do that, for the same reason that we don't encode discrete numeric columns: Semantically, character columns are not to be encoded. If one encounters data where encoding makes sense, then the column has the wrong type and should be converted (just as we would do with discrete numerics in this case). The canonical way of converting is

po("colapply", applicator = as.factor, affect_columns = selector_type("character"))

(possibly followed by fixfactors or other).

We may want to have a test whether character->factor->encode works, factors in R are a difficult topic sometimes.

pfistfl commented 3 years ago

I can agree. I guess this is then documented here. I find it a little cumbersome to always have to include this line, as treating it as a factor is what people wanna do in most cases, but this is a simple enough solution.