mapbox / osm-compare

Functions that identify what changed during a feature edit on OpenStreetMap.
ISC License
38 stars 15 forks source link

Analysing overlapping features in OSM using tile-reduce #145

Open amishas157 opened 7 years ago

amishas157 commented 7 years ago

Ref : Currently we have a feature overlap comparator to flag all newly added (version 1) water bodies which overlaps with any of the existing features.

Following are the uncertainities discussed by @bkowshik in the referenced post.

What zoom level tiles should be downloaded from the API Tile at lower zoom levels don't have all the data. Ex: Buildings generally show up in tiles greater than zoom 15 There could be good overlaps too. We need to differentiate between a good and a harmful overlap. Lakes and parks features fit this use-case well. What other feature types can something like this handle?

Per discussion w/ @bkowshik ,to study more on above discussed issues, we can perform tile-reduce to find out existing overlapping features in OSM and visualize the different overlapping combination and count of such features. It would help us in getting a list of feature types which can help us tighten this compare function.

How I am seeing it is we will get a count as following: Overlapping feature type | Overlapped feature type | Count

Here is the tile-reduce script I am working on. So this script finds out all waterbodies in a tile and then checks for other features in the same tile overlapped by it.

One issue in above process is we will miss counts from the relations as mb tiles used in tile-reduce doesn't contains relation type features.

Would be glad to feedbacks from the team.

cc @batpad @geohacker @planemad @lukasmartinelli @ian29

geohacker commented 7 years ago

This is a great idea! OSM QA Tiles aren't the best for this research though because - no relations, tile edges, multipolygon handling.

@amishas157 Let's use osmium. Take a look at osmlazer for a sketch.

amishas157 commented 7 years ago

@geohacker Yes, that would be great. Now how I am thinking of approaching this problem is following:

Find the overlapping features with an area greater than a threshold value and find the different kind of overlapping and overlapped features. And then doing a deeper analysis on the combination of two.

amishas157 commented 7 years ago

Analysing Feature Overlaps

Idea behind doing this analysis is to get an idea what kind of overlaps exist in OSM , get an estimate of the numbers for the same and also to differentiate a good overlaps from the bad ones. The example of good ones can be building overlapping with landuse feature but the bad ones can be water feature overlapping over buildings.

The above process is carried out as following:

  1. OsmLazer to convert .pbf files into a ouput json file. (The reason for considering .pbf files and not OSM QA tiles is because OSM QA tiles doesn't give relations) Only ways and relations are filtered from the pbf files. Relation members are indexed using member ids and this list is used further in data processing, which is explained in steps ahead. Features are also filtered based on presence of primary tags in the same and also we ignore the features which has layer tag in it. As layer = +ve/-ve number can give false positive for feature overlap because these actually don't make any bad overlaps.
Primary Tags List: 'aerialway', 'aeroway', 'amenity', 'barrier', 'boundary', 'building', 'craft', 'emergency'
, 'emergency', 'geological', 'highway', 'landuse', 'leisure', 'man_made', 'military', 'natural',
'office', 'man_made', 'places', 'power', 'public_transport', 'railway', 'route', 'shop', 'sport', 
'tourism','waterway'
  1. Tippecaone is used to cut json file into mbtiles with min and max zoom level as 16.

  2. Used Tile reduce script to process the features in above list. Overlaps are calculated as following:

    • Each item in feature list is checked against every other members in the list.
    • The two items which are checked , if they belong to the same relation, thes are ignored. Relation indexing mentioned in the above steps helps in doing this.
    • Overlaps are considered to be 3 types. Area feature w/ Area feature, Area feature w/ Line feature, Line feature w/ Line feature.
    • For the process, we have ignored the cases of intersection i.e. Line w/ Line features.
    • For Area w/ Area feature, if area of intersection is greater than a certain threshold, it is considered as an overlap.
    • For Area w/ Line feature, if line feature crosses the area features, then it is considered as an overlap. Line feature sharing boundaries with Area feature is not considered as an overlap.

Analysis for Monaco, a small city in Europe is done initially.

cc @bkowshik @geohacker @batpad

lukasmartinelli commented 7 years ago

Wow! Awesome analysis Amisha!

batpad commented 7 years ago

@amishas157 💥 ! this analysis is amazing.

Is it possible to list down what next actions look like to you here?

amishas157 commented 7 years ago

Next actions:

bkowshik commented 7 years ago

Cleaned up the JSON @amishas157 posted ^ into a csv for 👀 better


It is super-interesting to see footway features overlap with so many other features. Out of the total 70 rows, there are 16 rows with a footway feature either in the first or the second column.

The highest overlap of 22 between garden and footway makes sense right? There are lots of footway in garden features.

amishas157 commented 7 years ago

Here is the updated JSON object. https://gist.github.com/amishas157/ec0f042d7e69a576a337d156742547f5 after removal of few dups and improving a bit of logic.

Thanks @bkowshik for the CSV. 🙇‍♀️

@bkowshik

The highest overlap of 22 between garden and footway makes sense right? There are lots of footway in garden features.

Yes, correct. But this seems to be a legit overlap kind no 🤔 ?

bkowshik commented 7 years ago

Per voice with @manoharuss and @amishas157


Priority

Rendering

Percentage of overlap

Noise

area=yes mapping convention

Bad combination

amishas157 commented 7 years ago

Updates

amishas157 commented 7 years ago

Analysis for overlaps between natural:water and building:yes

Total number of overlaps found: 37

Based on eyeballing these overlaps, can be categorized as following:

1 screen shot 2017-04-19 at 10 39 38 pm screen shot 2017-04-19 at 10 59 59 pm screen shot 2017-04-20 at 12 18 38 pm screen shot 2017-04-19 at 10 36 35 pm

Learnings:

lukasmartinelli commented 7 years ago

I still think it is a valuable addition, especially in light that while Case 1 is not found very often in the map - it is some bad vandalism we've seen before.

Apart from water I think this will helpful to help detecting Pokemon users adding new parks on top of buildings.

lukasmartinelli commented 7 years ago

Also the detailed documentation how you approach this problem is an inspiring example! Thanks for digging into this.

bkowshik commented 7 years ago

Really enjoying how this is moving, awesome work @amishas157 🎉

manoharuss commented 7 years ago

Awesome work @amishas157.

@krishnanammala and I reviewed 32 changesets out of which 3 were found to have been actionable. Hitrate: 9.3%

Observations:

Overlap feedback

  1. Pitch and rock can be excluded out of leisure overlap https://github.com/mapbox/osm-compare/blob/master/comparators/feature_overlap.js#L164
  2. Can we have a threshold on how much % an overlap should be flagged to avoid flagging rough tracing and imagery offset based mapping https://osmcha.mapbox.com/48187204/, https://osmcha.mapbox.com/48184134/
  3. This feature has only overlap with a pedestrian highway, wrong detection? https://osmcha.mapbox.com/48176599/
  4. Some buildings do overlap with parks https://osmcha.mapbox.com/48171466/. I think we have to deal with this kind of noise.
  5. Wrong detections? https://www.openstreetmap.org/way/489576331/history, https://osmcha.mapbox.com/48163148/
manoharuss commented 7 years ago

@amishas157 This changeset was flagged with 3 features for Feature overlap comparator. https://osmcha.mapbox.com/48936480/

  1. 495603088 - 1st feature was a building and was only sharing a boundary with the next building\
  2. The other 2 features that were flagged did not have any overlap with anyother feature at all.
manoharuss commented 7 years ago

Posting here for visibility

screen_shot

Will post more notes after a sample review.

manoharuss commented 7 years ago

Review Feedback 6th June

I went on by reviewing unchecked changesets by feature overlap comparator in OSMCha and captured notes on the noise observed

  1. Observed more cases as mentioned in the above comment, when the feature overlap detected in the that same changeset.

image

  1. Observed a case where a leisure = park was flagged for feature overlap, it was hard to understand which was the other feature the overlap was with, as the data seemed to be as expected. Changeset: https://osmcha.mapbox.com/49300341/. This changeset is a good example to learn from and remove few values from the list of feature types we are checking for. Example: Remove amenity = toilet when checking for a overlap combinations for leisure park

  2. Feature flagged in this changeset leisure = park has a couple of legit buildings inside it

image

Let us consider this as an exception for leisure = parks vs buildings, but the park originated from an experienced user with 3k changesets. So maybe we should think about adding new user condition to the comparator.

I am have a doubt on max zoom as well - https://github.com/mapbox/osm-compare/blob/master/comparators/feature_overlap.js#L14

geohacker commented 7 years ago

Thank you @manoharuss! So looks like we have two major problems to address here:

  1. Avoid comparing different version of the same feature.
  2. Take into account features that are modified in the same changeset before comparing for overlap.