mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Feature level classifier in Gabbar #43

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

Gabbar has traditional been a changeset level classifier. Which means, given a changeset ID, Gabbar extracts features at the changeset level to predict if the changeset is harmful or not. Let's try a feature level classifier as part of Gabbar.

Why feature level classifier?

Feature level dataset

Thanks to osmcha's filters, we can filter out changesets reviewed with the maximum number of features created, modified and deleted being one or less.

One feature Number of changesets reviewed Harmful changesets
Created 3,333 413
Modified 9,727 2,264
Deleted 321 20

cc: @batpad

bkowshik commented 7 years ago

One feature modification classifier

From osmcha, we can see that on average:

From the changesets manually labelled on osmcha:

Thus, in the first iteration of the feature level classifier, I will focus on these changesets that have one feature modified only. A classifier with a good detection rate should help up identify 30% of the total harmful changesets. :crossed_fingers:

bkowshik commented 7 years ago

Model parameter tuning

bkowshik commented 7 years ago

Metrics on validation dataset

Predicted good Predicted harmful
Labelled good 223 232
Labelled harmful 40 54
             precision    recall  f1-score   support

          0       0.85      0.49      0.62       455
          1       0.19      0.57      0.28        94

avg / total       0.74      0.50      0.56       549
bkowshik commented 7 years ago

Jupyter notebook with analysis is at the link below:

bkowshik commented 7 years ago

Manually reviewed 20 changesets that were labelled harmful but predicted to be not harmful.

Notes

bkowshik commented 7 years ago

Things did improve after adding all the features ^

Predicted good Predicted harmful
Labelled good 382 67
Labelled harmful 62 26
             precision    recall  f1-score   support

          0       0.86      0.85      0.86       449
          1       0.28      0.30      0.29        88

avg / total       0.77      0.76      0.76       537

Variation

Predicted good Predicted harmful
Labelled good +71% -71%
Labelled harmful +55% -52%

NOTE: A positive percentage denotes an increase in numbers in comparison to the previous run while a negative percentage denotes a decrease.

bkowshik commented 7 years ago

Progress metrics on the validation dataset

After adding features

Before adding features


cc: @batpad

bkowshik commented 7 years ago

@manoharuss @krishnanammala Following is a csv file with 50 changesets predicted by Gabbar to be problematic and 50 changesets predicted by Gabbar to be good, a total of 100 changesets.

Can you please review these changesets on osmcha and label them as usual with a πŸ‘ or πŸ‘Ž


My expectation based on model metrics :crossed_fingers:


cc: @planemad

krishnanammala commented 7 years ago

@manoharuss & I shared the 100 changesets into half and I took the first 50 changesets and reviewed: Here πŸ‘‡ are my observations:

cc @bkowshik

bkowshik commented 7 years ago

NOTE: Following are number for the 100 changesets dump.


Thank you @manoharuss and @krishnanammala. We do have quite a long way to go. 😞

Confusion matrix

Predicted good Predicted harmful
Labelled good 46 49
Labelled harmful 3 1

Learnings

bkowshik commented 7 years ago

@anandthakker and me had a great discussion on the latest version of Gabbar and it's predictions.

On training dataset

Predicted good Predicted harmful
Labelled good 4850 5
Labelled harmful 0 437

On validation dataset

Predicted good Predicted harmful
Labelled good 2159 71
Labelled harmful 43 166

@manoharuss and @krishnanammala, we are good for the second round of πŸ‘€ from you. In the following csv, the sheets, 2017-06-12 (Mon) has 50 changesets predicted problematic and 50 changesets predicted good by the latest model in Gabbar.

Can you please review these changesets on osmcha and label them as usual with a πŸ‘ or πŸ‘Ž

bkowshik commented 7 years ago

@manoharuss @krishnanammala I had missed posting back learnings from the review you did last time. Posting here with additional details.

Adding old_name to features

In the dataset, there are a total of 27 changesets by user ΠŸΠΎΡ€Ρ„ΠΈΡ€ΠΈΠΉ where a feature got an old_name. The old model had flagged a majority of them as problematic. With the new model, we are getting:

I don't see anything that stands out with the 2 changesets flagged harmful. I guess this is what we get with the current set up.

Duplicate looking tags

The idea was to changesets when features had both the tags, building and building_1 Ex: https://osmcha.mapbox.com/48444157/ But, the way I calculated the duplicate count resulted in some side-effects the results of which, the following were flagged :thumbsdown: as well:

Will think of a workaround for this.

bkowshik commented 7 years ago

Thank you @manoharuss and @krishnanammala action now at: https://github.com/mapbox/gabbar/issues/69