mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Increase training size for feature level classifier #45

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

Ref https://github.com/mapbox/gabbar/issues/43


Next actions


cc: @batpad @geohacker

bkowshik commented 7 years ago

Curious to see the effect training size of the model has on the metrics, we have the following:

index-2

Notes / Questions


cc: @anandthakker

bkowshik commented 7 years ago

Workflow

  1. Set number of samples to use for the current run
  2. Use only this subset of samples from the labelled training data
  3. Train a model on this subset of training data
  4. Get predictions from model for the entire validation dataset
  5. Extract metrics on validation dataset
  6. Increase number of samples to use for the next run and go again
bkowshik commented 7 years ago

Before we had 8,620 labelled samples out of which 6,036 was used for training and 2,584 for validation. With the backfill done, we now have 10,165 out out which we use 7115 for testing and 3050 for validation.

Interestingly, the nice upward graph now has become something like below. I don't understand why this is happening though.

index-2

We are 💯 to close here.