rstudio / mleap

R Interface to MLeap
http://spark.rstudio.com/guides/mleap/
Apache License 2.0
24 stars 9 forks source link

Any support for ft_dplyr_transformer/ft_sql_transformer? #40

Open kputschko opened 4 years ago

kputschko commented 4 years ago

Hello,

I'm exploring Spark Pipelines and MLeap for the first time. I'm trying to export a MLeap bundle based on documentation found at the RStudio website. My pipelines make use of dplyr/sql transformations prior to modeling. Am I correct in assuming there is no support for these stages in a pipeline that I plan to export to a MLeap bundle?

mtcars_tbl <- sdf_copy_to(sc, mtcars, overwrite = TRUE)
new_mtcars <- mtcars_tbl %>% select(hp, wt, qsec, mpg)

pipeline <-
  ml_pipeline(sc) %>%
  ft_dplyr_transformer(new_mtcars) %>%
  ft_binarizer("hp", "big_hp", threshold = 100) %>%
  ft_vector_assembler(c("big_hp", "wt", "qsec"), "features") %>%
  ml_gbt_regressor(label_col = "mpg")

pipeline_model <- ml_fit(pipeline, mtcars_tbl)

transformed_tbl <- ml_transform(pipeline_model, mtcars_tbl)

model_path <- file.path(tempdir(), "mtcars_model.zip")
ml_write_bundle(pipeline_model, mtcars_tbl, model_path, overwrite = TRUE)

Gives the following error:

Error: java.util.NoSuchElementException: key not found: org.apache.spark.ml.feature.SQLTransformer

I'm using Spark 2.3, sparklyr 1.1, with mleap 0.12