mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Effect of attributes on the feature level classifier #59

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

Similar to work on training size, we have questions on effect of number of attributes on model:

Workflow

Notes

index


cc: @anandthakker @batpad @geohacker

bkowshik commented 7 years ago

What would it look like when attributes are added in order of importance for prediction instead of in the order they appear in the csv dataset?

The GradientBoostingClassifier provides a method, model.feature_importances_ that gives out scores for feature importance, the higher the score the more important the feature for predictions.

screen shot 2017-06-14 at 2 53 41 pm

Table with 10 attributes that have the highest importance scores

Now, using the same workflow as ^, we add one attribute at a time but starting with the most important attributes to get the graph below.

index

bkowshik commented 7 years ago

After increasing the dataset size, still see the unusual dips. đŸ¤”

index