zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

Improve extractBestPreds #148

Open zachmayer opened 9 years ago

zachmayer commented 9 years ago

It currently makes a LOT of copies of the input dataset, which can suck of huge amounts of RAM:

    for (i in 1:length(modelLibrary)) {
        out <- modelLibrary[[i]]
        tune <- tunes[[i]]
        for (name in names(tune)) {
            indxLogic <- out[, name] == tune[, name]
            indxLogic[is.na(indxLogic)] <- FALSE
            out <- out[indxLogic, ]
        }
        out <- out[order(out$Resample, out$rowIndex), ]
        newModels[[i]] <- out
    }

I think a simple fix would be to use a data.table internally, which would save use from copies at every subset.