mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Detect changesets that are very likely to have problems #63

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

The 2 Parts

There are two parts to the problem:

  1. High precision
    • High percentage of correct vs incorrect predictions or fewer false positives.
    • Ex: Predictions are right about 80% of the time but it finds less than 20% of all the problematic edits.
  2. High recall
    • Find all or most of the problematic edits.
    • Ex: Finds 80% of all problematic edits but is right only 20% of the time.

Ideally, we want a model that has both high precision and high recall. :innocent: But, practically we can only hit one at a time. And once we hit one well, we work on the other problem to hit that out too.


With this ticket, I would like to propose:


cc: @anandthakker @geohacker @batpad

bkowshik commented 7 years ago

Tweaked the current just a little, instead of giving problematic changesets extra weight, gave them equal weights as the good changesets to get things going. This, is one of the ways where we tell the model to choose accuracy over recall. The following is what I found:

Validation dataset

Testing dataset

This is on the lines of what we want:

Model is right about 80% of the time but finds less than 20% of all the problematic changesets

bkowshik commented 7 years ago

While manually reviewing the list 47 changesets, noticed quite a lot of changesets where a feature with area=yes got modified into building=yes being flagged as problematic.

screen shot 2017-06-14 at 4 44 54 pm

Using the notebook to find similar samples, I found the following changesets where the transformation was the same, feature with area=yes got modified into building=yes and quickly, labelled all these changesets on osmcha with a :thumbsup: so that we can reuse it as training data for the classifier.

screen shot 2017-06-14 at 5 04 36 pm

Next actions

bkowshik commented 7 years ago

First run

In the first run, the 10 changesets were part of both the testing and training datasets. Since all data in testing was part of the training dataset, the model already knew the right answers for them. Thus, none of the 10 changesets was predicted as problematic. :innocent: 40 out of 19760 (0.2%) changesets in the testing dataset was flagged problematic.

Second run

For the second run, I split up the 10 changesets, 5 went to the training dataset and the other 5 to the testing dataset.

This time, only 23 out of 19760 (0.12%) changesets in the testing dataset were flagged as problematic. Once the model was trained, all changesets in training and testing datasets were predicted correctly as good. :tada:

Third run

This time, I swapped the training and testing datasets, the first 5 went to testing while the other 5 were part of the training dataset.

This time, 36 out of 19760 (0.18%) changesets in the testing dataset were flagged as problematic. Interestingly, all changesets in the training datasets were rightly predicted as good while 4 out of the 5 changesets in testing were predicted harmful. :disappointed:


I don't clearly understand why these subtle changes in results. Will go ahead with a random mix of these 10 changesets into both training and validation.