Closed bkowshik closed 7 years ago
Tweaked the current just a little, instead of giving problematic changesets extra weight, gave them equal weights as the good changesets to get things going. This, is one of the ways where we tell the model to choose accuracy over recall. The following is what I found:
47
out of 19,741
was flagged as problematic (0.24%)This is on the lines of what we want:
Model is right about 80% of the time but finds less than 20% of all the problematic changesets
While manually reviewing the list 47 changesets, noticed quite a lot of changesets where a feature with area=yes
got modified into building=yes
being flagged as problematic.
Using the notebook to find similar samples, I found the following changesets where the transformation was the same, feature with area=yes
got modified into building=yes
and quickly, labelled all these changesets on osmcha with a :thumbsup: so that we can reuse it as training data for the classifier.
In the first run, the 10 changesets were part of both the testing and training datasets. Since all data in testing was part of the training dataset, the model already knew the right answers for them. Thus, none of the 10 changesets was predicted as problematic. :innocent: 40 out of 19760 (0.2%) changesets in the testing dataset was flagged problematic.
For the second run, I split up the 10 changesets, 5 went to the training dataset and the other 5 to the testing dataset.
This time, only 23 out of 19760 (0.12%) changesets in the testing dataset were flagged as problematic. Once the model was trained, all changesets in training and testing datasets were predicted correctly as good. :tada:
This time, I swapped the training and testing datasets, the first 5 went to testing while the other 5 were part of the training dataset.
This time, 36 out of 19760 (0.18%) changesets in the testing dataset were flagged as problematic. Interestingly, all changesets in the training datasets were rightly predicted as good while 4 out of the 5 changesets in testing were predicted harmful. :disappointed:
I don't clearly understand why these subtle changes in results. Will go ahead with a random mix of these 10 changesets into both training and validation.
The 2 Parts
There are two parts to the problem:
Ideally, we want a model that has both high precision and high recall. :innocent: But, practically we can only hit one at a time. And once we hit one well, we work on the other problem to hit that out too.
With this ticket, I would like to propose:
80%
of the time but finds less than20%
of all the problematic changesets.cc: @anandthakker @geohacker @batpad