mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Model learning a pattern incorrectly during training #54

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

Ref: https://github.com/mapbox/gabbar/issues/43

There were 5 changesets in the training dataset, that the model was not able to learn correctly. They were labelled to be 👍 on osmcha but somehow the model was predicting them to be a 👎

Predicted good Predicted harmful
Labelled good 4850 5
Labelled harmful 0 437

Curious to understand why, I 👀 the results myself. 4 out of the 5 had a pattern. In each of them, a natural=water feature got a new property in water=marsh.

water-is-marsh

All attributes except the following are same for all these samples.

Next actions


cc: @anandthakker @geohacker @batpad

bkowshik commented 7 years ago

Next, I wrote a script that given a changeset, gives as output changesets sorted by how dissimilar they are to the given changeset. Ex: If no attributes are different, the dissimilarity is 0, if 5 attributes differ, the dissimilarity is 5.

For changeset, 47078765 I got the following results on top:

Changeset ID Dissimilarity Notes
47078765 0 Changeset is dissimilar to itself by zero attributes
47078737 4 We have seen ^^
47078730 4 We have seen ^^
47078746 4 We have seen ^^
47078698 5 Super interesting!!!
46690182 16 Very dissimilar so not interesting
screen shot 2017-06-13 at 12 52 05 pm

Basically, the operation in this changeset is the same, natural=water feature getting a new property in water=marsh. But, what is different is that the changeset has a 👎

Since we give a higher sample weight for changesets that are 👎 in comparison to changesets that are 👍, the model has learned to give priority for one changeset labelled problematic over 4 other changesets that are labelled good.

bkowshik commented 7 years ago

Question

Answer

NOTE: Each cell has values for before and after in the format: before -> after

Predicted good Predicted harmful
Labelled good 4850 -> 2574 5 -> 2282
Labelled harmful 0 -> 50 437 -> 386

Ex: 4850 changesets were both predicted and labelled good before. But, now there are 2574 changesets. This is such a drastic difference right?

0bpqqmw

There is a drastic difference in the predictions too:

Changeset Prediction before Prediction now
47078698 Good Problematic
47078765 Good Problematic
47078737 Good Problematic
47078730 Good Problematic
47078746 Good Problematic

I could not believe the results. I went back to the csv, updated the label for changeset 47078698 from 👍 back to the original 👎 that things were as they were before. Starting with one question, we have now come to a totally different question! 😂

Next actions

bkowshik commented 7 years ago

Per conversation with @Fa7C0n, the tag is actually deprecated and all the five changesets should have a 👎 instead.

So, I updated the labels for these changesets and trained the model again. Nothing interesting this time. Things worked as expected. The 4 changesets that were learned incorrectly were actually correct. This was super fun!

Predicted good Predicted harmful
Labelled good 4850 1
Labelled harmful 0 441
Fa7C0n commented 7 years ago

Excuse me for this mix up @bkowshik . Let me know if you come across similar situation in future.