feature suggestion: highlight classification models that allow class probabilities

unitroot commented 6 years ago

Firstly, thank you for the superb documentation here: https://topepo.github.io/caret/train-models-by-tag.html

I really appreciate the effort put into creating such a ready-to-use manual instead of a simple vignette. There is one feature missing for me, although, I don't who else is running into this.

I often evaluate numerous models for time series predictions. I usually work with classification models, where I use the class probabilities instead of the actual classification result as a performance measure. As you know, not all models support this and finding out by trial-and-error is cumbersome with 100+ models.

I suggest adding a qualifier like "This model supports class probabilities." in the model description page (link above), just like you indicate the availability of varImport ("A model-specific variable importance metric is available.")

unitroot commented 6 years ago

Should anyone be interested, this is a script yielding a df with the desired distinction

models <- c("adaboost","AdaBoost.M1","amdai","vglmAdjCat","AdaBag","bagFDAGCV", "bagFDA","binda,ada","LogitBoost","J48","C5.0","rpartScore","chaid", "vglmContRatio","C5.0Cost","rpartCost","vglmCumulative","deepboost", "dda","dwdPoly","dwdRadial","RFlda","fda","FRBCS.CHI","GFS.GCCL", "FH.GBML","SLAVE","FRBCS.W","gpls","protoclass","hda","hdda","hdrda", "svmLinearWeights2","lvq","lssvmLinear","lssvmPoly","lssvmRadial", "lda","lda2","stepLDA","dwdLinear","svmLinearWeights","loclda", "LMT","Mlda","mda","manb","mlpKerasDropoutCost","mlpKerasDecayCost", "nb","naive_bayes","nbDiscrete","awnb","pam","ORFlog","ORFpls", "ORFridge","ORFsvm","ownn","polr","PRIM","pda","pda2","PenalizedLDA", "plr","multinom","qda","stepQDA","rFerns","rda","rlda","regLogistic", "Linda","rmda","QdaCov","rrlda","RSimca","rocc","rotationForest", "rotationForestCp","JRip","PART","nbSearch","sda","CSimca", "C5.0Rules","C5.0Tree","OneR","sdwd","sparseLDA","smda","slda","snn", "svmRadialWeights","tan","tanSearch","awtan","vbmpRadial","wsrf", "treebag","logicBag","bagEarth","bagEarthGCV","bag","bartMachine", "bayesglm","gamboost","glmboost","BstLm","bstSm","bstTree", "blackboost","rpart","rpart1SE","rpart2","cforest","ctree","ctree2", "randomGLM","xgbLinear","xgbTree","elm","gaussprLinear","gaussprPoly", "gaussprRadial","gamLoess","gamSpline","bam","gam","glm","glmStepAIC", "glmnet","glmnet_h2o","gbm_h2o","knn","kknn","svmLinear3","logreg", "avNNet","monmlp","mlp","mlpWeightDecay","mlpWeightDecayML","mlpML", "msaenet","mlpSGD","mlpKerasDropout","mlpKerasDecay","earth", "gcvEarth","mxnet","mxnetAdam","nnet","pcaNNet","null","parRF", "partDSA","kernelpls","pls","simpls","widekernelpls","plsRglm", "ordinalNet","rbf","rbfDDA","ranger","rf","Rborist","extraTrees", "rfRules","RRF","RRFglobal","xyf","spls","dnn","gbm", "svmBoundrangeString","svmExpoString","svmLinear2","svmLinear", "svmPoly","svmRadial","svmRadialCost","svmRadialSigma", "svmSpectrumString","evtree","nodeHarvest") y <- as.factor(round(runif(100,0))) levels(y) <- c("Y", "N") x <- data.frame("N" = rnorm(100), "T" = rt(100, 5, 2))

modCtrl <- caret::trainControl(method = "none", classProbs = TRUE)

dfCP <- data.frame("model" = NA, "classProbs" = NA)

for (model in seq_len(NROW(models))){ dfCP[model,1] <- models[model] print(models[model]) ela <- Sys.time() modTune <- try(caret::train(x = x, y = y, method = models[model], tuneLength = 1, trControl = modCtrl)) print(Sys.time() - ela) dfCP[model,2] <- ifelse(class(modTune) == 'try-error', FALSE, TRUE) }

topepo commented 6 years ago

That should do it. These will show up the next time I run the docs (prob in about a week). The new tag is "Supports Class Probabilities"

topepo / caret

feature suggestion: highlight classification models that allow class probabilities #808