Open bkowshik opened 7 years ago
In the dataset I had locally, found 36 changesets where highway=residential
got modified to highway=unclassified
. I 👀 a couple of these changesets.
unclassified
or residential
residential
highway right?There are 3 action types for a highway feature
created
modified
. Property and/or geometry modificationdeleted
There are some attributes that are dependent on the action type. For example, the difference in length of highway is only for action modification; there are no two versions of the highway to calculate difference when it is newly created. Next, what attributes are relevant or not when a highway is deleted? I am 🤔 won't a length_difference column be redundant for a newly created highway?
I am not sure how to solve this problem, would love to hear ideas. But, for a start I am planning to add just the attributes in the latest version of the model along with the action in create, modify or delete. Let's see how this goes. If these attributes are not sufficient, we could add other diff attributes like difference in highway length, distance between the centroids, etc.
Very early results, 2 out of the 6 predicted in the sample are interesting.
highway=residential
goes inside a park2,732
2,655
77
With previous runs, I trained the model on the training dataset and measured metrics on the validation dataset. But, because of the narrow scope of the problem, we have samples on the lower side. Thus, I went the route of Cross Validation
.
10%
(Fraction of changesets harmfu labelled problematic)20%
(Fraction of harmful changesets predicted harmful)From among the unlabelled testing dataset of , 6
out of 344
were predicted to be problematic. The results are interesting indeed.
highway=footway
and area=yes
don't exist together! :tada:demolished
highway. Did not know something like that existed.I experimented with scaling features using sklearn.preprocessing.StandardScaler
Feature scaling does seem to have a small impact. Even through the mean scores come down, the standard deviation are down as well.
460
out of the total 2732
(17%) samples had a modification in name, which includes name additions, modifications and deletions. 22
of the 77
(28.57%) harmful changesets were name modifications. I added an attribute called feature_name_modified
to see if that helps. The model put the feature_name_modified
at the 5th
position in the importance list.
The model metrics did not show a significant variation.
highway=footway
: 1Feature is not good because of personal information in the name tag
river
: 13oneway
: 4Harmful change when a highway feature becomes something else
The following gist has a random sample of 25
predictions from the first version of the highway classifier. The csv has both the changeset_id
and feature_id
.
@krishnanammala can you 👀 these changesets on osmcha and give me some feedback?
cc: @planemad @batpad
As per comment https://github.com/mapbox/gabbar/issues/69#issuecomment-312801138 above , I have gone through the changesets that are flagged by the Gabbar (Highway classifier). Here are my observations:
The both harmful changesets are deletions of turn:lanes
& lanes
tags and both of them are from the same user.
I have outlined the detections in much clear way segregating them under Good detections and detections with less priority so that it helps @bkowshik getting more context in terms of improvement.
Good detections | detections with less priority |
---|---|
|
Geometry of highways changing |
|
highways with rest_areas & traffic signals which are less priority |
|
Addition of layer tags to minor highways i.e., service roads |
|
Addition and modification of low classification highways i.e., Tracks,paths,service roads |
Hope the above observations will help you @bkowshik 👍
One of the popular problems in machine learning is dogs vs cats; given a picture predict whether the picture is of a dog or a cat. Coming from this initial experience about machine learning, I kept thinking the problem of classification of changesets as good or problematic is something similar. But, today I did an exercise where I wanted to identify one attribute about the changeset that makes it good or problematic. I started with:
highway=residential
is modified tohighway=unclassified
The following questions came to mind
residential
better thanunclassified
; I mean something is better than nothing right?15
, this is quite a mature feature. So, is that alright?source=google maps
Really?From https://wiki.openstreetmap.org/wiki/Key:highway
From https://osmlab.github.io/osm-deep-history/#/way/103217436
highway=unclassified
since creation in 2011.Looking deeper into other changesets where a
highway=residential
gets modified intohighway=unclassified
, I find this user,Порфирий
who has lots of changesets with the same behavior. Interestingly, the user who addedhighway=residential
isПорфирий
too.Eureka!
When a highway modification has so many questions to answer and attributes to look at, what will the scale be when we look at all 26 primary tags together? What about features that don't have any primary tags? Too many questions! Too many attributes! Right?
cc: @anandthakker @geohacker @batpad