mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Tweaking model parameters to improve predictions #25

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

NOTE: Have updated this post to reflect new performance numbers and graph.


Tweaking model parameters seems to have quite a lot of impact on the results. The results are looking so much better when compared to the previous run.

index

Current best model parameters for the SVC model:

{
    "probability": true,
    "C": 10000,
    "gamma": "auto",
    "cache_size": 800,
    "class_weight": "balanced",
    "kernel": "rbf"
}

Next actions


cc: @anandthakker

bkowshik commented 7 years ago

In the scenario where all changesets predicted to be potentially harmful on osmcha are 👀 by users on osmcha, I was wondering, 💭 if we could calculate the hit rate as follows.

For labelled changeset data:

i.e: If 100 changesets labelled by the current model as potentially problematic are manually 👀, we should potentially find 37 changesets to be actually problematic.

NOTE: Posting here for feedback on if this is the right way to measure Hit Rate.

bkowshik commented 7 years ago

The model parameters that yielded the best model performance are:

{
    "probability": true,
    "C": 10000,
    "gamma": "auto",
    "cache_size": 800,
    "class_weight": "balanced",
    "kernel": "rbf"
}