mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Use GradientBoostingClassifier instead of SVC #30

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

Using the GradientBoostingClassifier instead of SVC.


cc: @anandthakker @batpad

bkowshik commented 7 years ago

Interestingly, GradientBoostingClassifier gives out scores for feature importance from the training phase on labelled datasets; the higher the score for a feature, the more important. It is not surprising to see user experience on top. Right?

Attribute Feature importance (higher the better)
user_changesets 0.249580119231
user_features 0.216964386956
bbox_area 0.0848201984366
node_count 0.0693931268049
way_count 0.0341420742257
tourism 0.0250598151593
features_created 0.0238505187554
leisure 0.0232829518699
highway 0.018433626407
place 0.0169637311112
features_modified 0.0169139690189
relation_count 0.0167516127551
changeset_editor_iD 0.0165070545409
natural 0.0158289804943
features_deleted 0.0154110304032
feature_version_new 0.0150152915484
office 0.0149398170558
man_made 0.0135615462195
geometry_modifications 0.0135419718884
building 0.0125329951139
sport 0.0123859796942
landuse 0.0102184255708
feature_version_low 0.0100570209659
boundary 0.00842187968273
waterway 0.00702827228637
amenity 0.00641413873931
shop 0.00479840910068
power 0.00423558427736
military 0.00379951547523
changeset_editor_Vespucci 0.00377132740797
feature_version_high 0.00335750678259
historic 0.00279467626428
barrier 0.00247333956818
changeset_editor_gnome 0.00200270123543
changeset_editor_other 0.00198136224788
public_transport 0.0013871576777
aerialway 0.000819880304265
feature_version_medium 0.000495699681976
changeset_editor_Potlatch 3.55154402271e-05
changeset_editor_JOSM 2.67896017466e-05
aeroway 0.0
changeset_editor_MAPS.ME 0.0
changeset_editor_Merkaartor 0.0
changeset_editor_OsmAnd 0.0
changeset_editor_Redaction bot 0.0
craft 0.0
emergency 0.0
geological 0.0
property_modifications 0.0
railway 0.0
route 0.0
bkowshik commented 7 years ago

index

Temp Labelled Predicted Number of changesets
0 True True 1141
1 True False 308
2 False True 180
3 False False 19709
bkowshik commented 7 years ago

We will soon have a setup to try out multiple algorithms and measure performance. Closing.