Open amishas157 opened 7 years ago
This is a great idea! OSM QA Tiles aren't the best for this research though because - no relations, tile edges, multipolygon handling.
@amishas157 Let's use osmium. Take a look at osmlazer for a sketch.
@geohacker Yes, that would be great. Now how I am thinking of approaching this problem is following:
Find the overlapping features with an area greater than a threshold value and find the different kind of overlapping and overlapped features. And then doing a deeper analysis on the combination of two.
Analysing Feature Overlaps
Idea behind doing this analysis is to get an idea what kind of overlaps exist in OSM , get an estimate of the numbers for the same and also to differentiate a good overlaps from the bad ones. The example of good ones can be building overlapping with landuse feature but the bad ones can be water feature overlapping over buildings.
The above process is carried out as following:
Primary Tags List: 'aerialway', 'aeroway', 'amenity', 'barrier', 'boundary', 'building', 'craft', 'emergency'
, 'emergency', 'geological', 'highway', 'landuse', 'leisure', 'man_made', 'military', 'natural',
'office', 'man_made', 'places', 'power', 'public_transport', 'railway', 'route', 'shop', 'sport',
'tourism','waterway'
Tippecaone is used to cut json file into mbtiles with min and max zoom level as 16.
Used Tile reduce script to process the features in above list. Overlaps are calculated as following:
Analysis for Monaco, a small city in Europe is done initially.
Total features having primary tags: 3192
Results from overlaps:
{ '{"building":"construction"},{"boundary":"administrative"}': 1,
'{"building":"construction"},{"highway":"steps"}': 1,
'{"building":"yes"},{"highway":"footway"}': 18,
'{"leisure":"swimming_pool"},{"building":"yes"}': 4,
'{"leisure":"park"},{"highway":"footway"}': 13,
'{"leisure":"park"},{"highway":"service"}': 3,
'{"leisure":"park"},{"landuse":"construction","building":"construction"}': 1,
'{"building":"yes"},{"boundary":"administrative"}': 15,
'{"building":"yes"},{"highway":"steps"}': 7,
'{"sport":"swimming","amenity":"swimming_pool"},{"leisure":"swimming_pool"}': 2,
'{"building":"yes"},{"highway":"service"}': 5,
'{"building":"yes"},{"landuse":"residential"}': 5,
'{"landuse":"residential"},{"highway":"footway"}': 2,
'{"landuse":"residential"},{"highway":"service"}': 6,
'{"highway":"footway"},{"natural":"water"}': 2,
'{"leisure":"park"},{"highway":"steps"}': 2,
'{"amenity":"fountain"},{"leisure":"park"}': 2,
'{"building":"yes"},{"leisure":"park","tourism":"attraction"}': 2,
'{"natural":"water"},{"leisure":"park","tourism":"attraction"}': 2,
'{"leisure":"park","tourism":"attraction"},{"highway":"footway"}': 6,
'{"leisure":"swimming_pool"},{"boundary":"administrative"}': 1,
'{"leisure":"swimming_pool"},{"highway":"footway"}': 1,
'{"highway":"secondary"},{"boundary":"administrative"}': 1,
'{"leisure":"playground"},{"highway":"footway"}': 1,
'{"highway":"primary"},{"boundary":"administrative"}': 1,
'{"building":"residential"},{"boundary":"administrative"}': 1,
'{"building":"residential"},{"highway":"primary"}': 1,
'{"building":"residential"},{"highway":"footway"}': 1,
'{"building":"residential"},{"highway":"service"}': 1,
'{"building":"yes"},{"highway":"residential"}': 4,
'{"man_made":"pier"},{"natural":"coastline"}': 1,
'{"building":"public"},{"highway":"footway"}': 1,
'{"building":"public"},{"boundary":"administrative"}': 1,
'{"building":"public"},{"highway":"service"}': 1,
'{"building":"yes"},{"landuse":"cemetery"}': 6,
'{"building":"yes"},{"highway":"primary"}': 3,
'{"landuse":"cemetery"},{"highway":"service"}': 3,
'{"building":"apartments"},{"highway":"primary"}': 1,
'{"building":"yes"},{"barrier":"retaining_wall"}': 1,
'{"leisure":"miniature_golf"},{"leisure":"park"}': 1,
'{"building":"yes"},{"highway":"pedestrian"}': 8,
'{"highway":"pedestrian"},{"highway":"pedestrian"}': 7,
'{"leisure":"garden"},{"highway":"footway"}': 22,
'{"amenity":"toilets","building":"yes"},{"leisure":"garden"}': 1,
'{"building":"yes"},{"leisure":"garden"}': 1,
'{"highway":"tertiary"},{"leisure":"sports_centre"}': 1,
'{"highway":"tertiary"},{"boundary":"administrative"}': 1,
'{"highway":"footway","man_made":"pier"},{"building":"yes"}': 2,
'{"building":"commercial"},{"highway":"footway"}': 16,
'{"building":"commercial"},{"highway":"steps"}': 8,
'{"building":"commercial"},{"highway":"pedestrian"}': 1,
'{"highway":"footway"},{"highway":"pedestrian"}': 14,
'{"highway":"pedestrian"},{"leisure":"park"}': 1,
'{"highway":"footway"},{"highway":"footway"}': 2,
'{"highway":"pedestrian"},{"boundary":"administrative"}': 1,
'{"highway":"pedestrian"},{"highway":"secondary"}': 1,
'{"highway":"pedestrian"},{"highway":"service"}': 1,
'{"amenity":"police","building":"yes"},{"highway":"residential"}': 1,
'{"building":"yes"},{"leisure":"park"}': 1,
'{"amenity":"shelter"},{"leisure":"park"}': 1,
'{"leisure":"swimming_pool"},{"leisure":"garden"}': 1,
'{"building":"yes"},{"amenity":"school"}': 2,
'{"amenity":"school"},{"highway":"footway"}': 1,
'{"leisure":"swimming_pool"},{"tourism":"hotel","building":"yes"}': 1,
'{"building":"hangar"},{"aeroway":"heliport"}': 2,
'{"aeroway":"helipad"},{"aeroway":"heliport"}': 8,
'{"aeroway":"terminal","building":"yes"},{"aeroway":"heliport"}': 1,
'{"aeroway":"heliport"},{"highway":"service"}': 1,
'{"building":"yes"},{"aeroway":"heliport"}': 1 }
The first JSON object represents primary tags present in feature1 and second represents primary tags present in feature 2. The third parameter gives the list of count of such overlaps.
In the above case: highest overlaps is found with building
overlapping with highways:footway
. But a harmful combination which is found is {"building":"yes"},{"highway":"primary"}
. Need to do more analysis on above and see what all is happening in OSM.
Next action is to perform the same process for a large city and see what all comes out.
cc @bkowshik @geohacker @batpad
Wow! Awesome analysis Amisha!
@amishas157 💥 ! this analysis is amazing.
Is it possible to list down what next actions look like to you here?
Next actions:
Cleaned up the JSON @amishas157 posted ^ into a csv for 👀 better
It is super-interesting to see footway
features overlap with so many other features. Out of the total 70
rows, there are 16
rows with a footway
feature either in the first or the second column.
The highest overlap of 22
between garden
and footway
makes sense right? There are lots of footway
in garden
features.
Here is the updated JSON object. https://gist.github.com/amishas157/ec0f042d7e69a576a337d156742547f5 after removal of few dups and improving a bit of logic.
Thanks @bkowshik for the CSV. 🙇♀️
@bkowshik
The highest overlap of 22 between garden and footway makes sense right? There are lots of footway in garden features.
Yes, correct. But this seems to be a legit overlap kind no 🤔 ?
Per voice with @manoharuss and @amishas157
building
, water
and highway
residential
and up are a good start.cemetry
can have overlapping buildingarea=yes
mapping conventionhighway: pedistrian
and area: yes
is acceptable mapping on OpenStreetMaphighway >= residential
overlaps with leisure=*
Updates
I took the smaller bbox from north-america [-137.8,46.1,-104.8,67.5] and got results as following: https://gist.github.com/amishas157/5f8ff9d5d238bcbf8c265f970900c4ee From eye balling a small portion of above results, got some harmful overlaps.
Wrote a script to query the result set to get a list of feature ids overlapping, when given two primary tags given to it.
Analysis for overlaps between natural:water
and building:yes
Total number of overlaps found: 37
Based on eyeballing these overlaps, can be categorized as following:
Learnings:
I still think it is a valuable addition, especially in light that while Case 1 is not found very often in the map - it is some bad vandalism we've seen before.
Apart from water I think this will helpful to help detecting Pokemon users adding new parks on top of buildings.
Also the detailed documentation how you approach this problem is an inspiring example! Thanks for digging into this.
Really enjoying how this is moving, awesome work @amishas157 🎉
Awesome work @amishas157.
@krishnanammala and I reviewed 32 changesets out of which 3 were found to have been actionable. Hitrate: 9.3%
Overlap feedback
@amishas157 This changeset was flagged with 3 features for Feature overlap comparator. https://osmcha.mapbox.com/48936480/
Posting here for visibility
Will post more notes after a sample review.
I went on by reviewing unchecked changesets by feature overlap comparator in OSMCha and captured notes on the noise observed
Observed a case where a leisure = park was flagged for feature overlap, it was hard to understand which was the other feature the overlap was with, as the data seemed to be as expected. Changeset: https://osmcha.mapbox.com/49300341/. This changeset is a good example to learn from and remove few values from the list of feature types we are checking for. Example: Remove amenity = toilet when checking for a overlap combinations for leisure park
Feature flagged in this changeset leisure = park has a couple of legit buildings inside it
Let us consider this as an exception for leisure = parks vs buildings, but the park originated from an experienced user with 3k changesets. So maybe we should think about adding new user condition to the comparator.
I am have a doubt on max zoom as well - https://github.com/mapbox/osm-compare/blob/master/comparators/feature_overlap.js#L14
Thank you @manoharuss! So looks like we have two major problems to address here:
Ref : Currently we have a feature overlap comparator to flag all newly added (version 1) water bodies which overlaps with any of the existing features.
Following are the uncertainities discussed by @bkowshik in the referenced post.
Per discussion w/ @bkowshik ,to study more on above discussed issues, we can perform tile-reduce to find out existing overlapping features in OSM and visualize the different overlapping combination and count of such features. It would help us in getting a list of feature types which can help us tighten this compare function.
How I am seeing it is we will get a count as following:
Overlapping feature type | Overlapped feature type | Count
Here is the tile-reduce script I am working on. So this script finds out all waterbodies in a tile and then checks for other features in the same tile overlapped by it.
One issue in above process is we will miss counts from the relations as mb tiles used in tile-reduce doesn't contains relation type features.
Would be glad to feedbacks from the team.
cc @batpad @geohacker @planemad @lukasmartinelli @ian29