mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Model baseline performance #60

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

A model baseline will help in understanding and measuring progress we are making with the model in terms of its performance. scikit, the package we use in Gabbar has a model just to do that:

I trained the DummyClassifier on the training dataset and got predictions on the validation dataset. Baselines look close to what a model generating random predictions would give.

Confusion matrix

Predicted good Predicted harmful
Labelled good 2086 247
Labelled harmful 223 27

Classification report

                precision   recall      f1-score    support

0.0             0.90        0.89        0.90        2333
1.0             0.10        0.11        0.10        250

avg / total     0.83        0.82        0.82        2583

roc_auc

These look very close to what I was expecting. No next actions.


cc: @anandthakker @batpad @geohacker