A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
It seems to be a little bit confused that the evaluation on classification tasks uses the probabilities output directly in calculating the AUC.
For example, in 6-xgboost.R#L39,
Will it be better to do that with (phat>0.5)?
It seems to be a little bit confused that the evaluation on classification tasks uses the probabilities output directly in calculating the AUC.
For example, in 6-xgboost.R#L39, Will it be better to do that with (phat>0.5)?