mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Using real changesets :boom: #19

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

Real changesets are amazing! :boom: They have both the new and old versions of all features in the changeset as JSON. I guess, we could not have asked for anything more! Spectacular work @geohacker and @batpad. Thank you. :smiley:


In this PR, I explore how we could build a machine learning model using changeset data from real changesets and manual labelling of whether a changeset is harmful or not from osmcha.

Approach

Features

I could extract 46 features using a variety of data sources. This was great learning to engineer these features, train a new model and visualize to see how these performance parameters changed.

User based

Feature based

Changeset based

Data and ML Model

Results

                precision   recall      f1-score   support

False           0.93        0.90        0.91      1099
True            0.25        0.30        0.27       115

avg / total     0.86        0.85        0.85      1214

index


cc: @anandthakker

bkowshik commented 7 years ago

So, how does the performance fare when compared to manual reviews on osmcha?

Confusion matrix

Predicted harmful Predicted not harmful
Reviewed harmful 258 81
Reviewed not harmful 213 3494
bkowshik commented 7 years ago

This was super helpful to get the workflow notebook. Closing in favor of https://github.com/mapbox/gabbar/pull/24