zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

Adding support for custom models #197

Closed eric-czech closed 8 years ago

eric-czech commented 8 years ago

Hey @zachmayer , I made changes to implement custom models for my own purposes and thought I'd send them your way to see if you're interested in collaborating on this. This PR is missing docs and new tests for the changes, but I'd be happy to modify and add the tests I had been using for myself assuming the extent of the code changes line up with what you were expecting (I saw several issues logged about it so I imagine it's crossed your mind before).

The only concession I had to make was to enforce that any custom models be given a "method" attribute so that the string associated with that attribute could be used internally. Technically that doesn't matter if the custom model is already named in the tuneList (that name is used instead), but it does matter if the caretModelSpec for that custom model is NOT named. Here's an example that will show what I mean:

library(plyr); library(dplyr)
library(caret); library(caretEnsemble)

# Generate some data to test with
d <- twoClassSim(n=100)
X <- d %>% select(-Class); y <- d$Class

# Create a couple custom models
customRF <- getModelInfo('rf', regex=F)[[1]]
customRF$method <- 'custom.rf'

customGLM <- getModelInfo('rf', regex=F)[[1]]
customGLM$method <- 'custom.glm'

cl <- caretList(
  X, y,
  tuneList=list(
    # The name used internally for this model will come from the "method" attribute above
    caretModelSpec(method=customRF, tuneLength=3),
    # This model on the other hand will be referred to as "myglm" not "custom.glm"
    myglm=caretModelSpec(method=customGLM, tuneLength=1),
    glmnet=caretModelSpec(method='glmnet', tuneLength=5),
    rpart=caretModelSpec(method='rpart', tuneLength=15)
  ),
  trControl=trainControl(method='cv', number=10, classProbs=T)
)

cs <- caretEnsemble(cl)
print(cs)

# A glm ensemble of 4 base models: custom.rf, myglm, glmnet, rpart
# ...

If that all sounds good to you, then I'll gladly add some test cases though I could use help on where to document it.

zachmayer commented 8 years ago

Looks good to me, thanks for the contribution!

zachmayer commented 8 years ago

Please add a couple of tests, and then run devtools::document() to build the new documentation.

zachmayer commented 8 years ago

Once you've added a test for the new functionality, this is good to merge.

Thanks for the contribution!

eric-czech commented 8 years ago

Cool, sounds good! I'll try to get some tests in there in the next couple days.

As far as documentation goes, do you think it's worth mentioning in the vignette? That's more of what I was getting at there.

zachmayer commented 8 years ago

Oh I see. That's not super important— update the vignette if you'd like, otherwise I will at some point in the future.

eric-czech commented 8 years ago

Nice, ok I won't change that then.

lintr-bot commented 8 years ago

tests/testthat/test-ensemble.R:210:3: style: Words within variable and function names should be separated by '_' rather than '.'.

​  X.class.df <- as.data.frame(X.class)
  ^~~~~~~~~~

tests/testthat/test-ensemble.R:213:34: style: Words within variable and function names should be separated by '_' rather than '.'.

​  expect_warning(cl <- caretList(X.class.df, Y.class, tuneList=tune.list, trControl=train.control))
                                 ^~~~~~~~~~

tests/testthat/test-ensemble.R:223:54: style: Words within variable and function names should be separated by '_' rather than '.'.

​  expect_silent(pred.classb <- predict(cs, newdata = X.class.df, type="prob"))
                                                     ^~~~~~~~~~

tests/testthat/test-ensemble.R:224:54: style: Words within variable and function names should be separated by '_' rather than '.'.

​  expect_silent(pred.classc <- predict(cs, newdata = X.class.df[2,], type="prob"))
                                                     ^~~~~~~~~~

tests/testthat/test-ensemble.R:224:67: style: Commas should always have a space after.

​  expect_silent(pred.classc <- predict(cs, newdata = X.class.df[2,], type="prob"))
                                                                  ^
coveralls commented 8 years ago

Coverage Status

Changes Unknown when pulling f61f0c2e106c217b5e14879897de6a4a2ca939e8 on eric-czech:custom_model into \ on zachmayer:master**.