Model baseline performance

A model baseline will help in understanding and measuring progress we are making with the model in terms of its performance. scikit, the package we use in Gabbar has a model just to do that:

sklearn.dummy.DummyClassifier

I trained the DummyClassifier on the training dataset and got predictions on the validation dataset. Baselines look close to what a model generating random predictions would give.

Confusion matrix

	Predicted good	Predicted harmful
Labelled good	2086	247
Labelled harmful	223	27

Classification report

                precision   recall      f1-score    support

0.0             0.90        0.89        0.90        2333
1.0             0.10        0.11        0.10        250

avg / total     0.83        0.82        0.82        2583

roc_auc

Score: 0.49 (0.02) - mean(std dev)

These look very close to what I was expecting. No next actions.

cc: @anandthakker @batpad @geohacker

mapbox / gabbar

Model baseline performance #60

Confusion matrix

Classification report

roc_auc