topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.62k stars 632 forks source link

consider adding Model-based recursive partitioning from partykit #135

Closed spedygiorgio closed 9 years ago

spedygiorgio commented 9 years ago

It would be nice to add MOB models from partykit package. They resemble cubist approach someways...

topepo commented 9 years ago

I don't think that we can do that formally since the nature of mob's formula is unlike those used by train and other functions.

However, you can create your own custom method for your particular data. Using the regression example in ?mob:


modelInfo <- list(label = "Model-based Recursive Partitioning",
                  library = "party",
                  loop = NULL,
                  type = "Regression",
                  parameters = data.frame(parameter = "alpha",
                                          class = "numeric",
                                          label = "Significance Level"),
                  grid = function(x, y, len = NULL) data.frame(alpha = c(0.001, 0.01, 0.05, 0.1)),
                  fit = function(x, y, wts, param, lev, last, classProbs, ...) {
                    dat <- if(is.data.frame(x)) x else as.data.frame(x)
                    dat$.outcome <- y
                    ## Intercept the control function if given by the ...
                    theDots <- list(...)
                    if(any(names(theDots) == "control")) {
                      theDots$control$alpha <- param$alpha
                      ctl <- theDots$control
                      theDots$control <- NULL
                    } else ctl <- mob_control(alpha = param$alpha)
                    ## Note that if you used the formula method when calling
                    ## `train`, the column names won't be the same in dat as they
                    ## are in the original data. For example, would would have 
                    ## "chasyes" instead of "chas" because of the dummy variables
                    ## and "rad" gets expanded into several columns because it is
                    ## treated as ordinal. In the example below, I used the 
                    ## non-formula method to get around this.
                    modelArgs <- c(
                      list(
                        formula = as.formula(".outcome ~ lstat + rm | zn + indus + chas + nox + age + dis + rad + tax + crim + b + ptratio"),
                        data = dat,
                        control = ctl),
                      theDots)
                    out <- do.call(getFromNamespace("mob", "party"), modelArgs)
                    out
                  },
                  predict = function(modelFit, newdata, submodels = NULL) {
                    if(!is.data.frame(newdata)) newdata <- as.data.frame(newdata)
                    predict(modelFit, newdata)
                  },
                  prob = NULL,
                  predictors = NULL,
                  tags = "Linear Regression",
                  varImp = NULL,
                  sort = function(x) x)

data("BostonHousing", package = "mlbench")
## and transform variables appropriately (for a linear regression)
BostonHousing$lstat <- log(BostonHousing$lstat)
BostonHousing$rm <- BostonHousing$rm^2
## as well as partitioning variables (for fluctuation testing)
BostonHousing$chas <- factor(BostonHousing$chas, levels = 0:1, 
                             labels = c("no", "yes"))
BostonHousing$rad <- factor(BostonHousing$rad, ordered = TRUE)

model_fit <- train(x = BostonHousing[, -14], 
                   y = BostonHousing$medv,
                   method = modelInfo)

model_fit2 <- train(x = BostonHousing[, -14], 
                   y = BostonHousing$medv,
                   method = modelInfo,
                   control = mob_control(bonferroni = FALSE))
spedygiorgio commented 9 years ago

Thank you very much