mapbox / osm-compare

Functions that identify what changed during a feature edit on OpenStreetMap.
ISC License
38 stars 15 forks source link

Review wikipedia/wikidata comparator #66

Closed geohacker closed 6 years ago

geohacker commented 7 years ago

Let's nail scenarios that need escalation for edits to features that have wikipedia/wikidata tags. For example: https://osmcha.mapbox.com/44862344/

@manoharuss can you inspect and lay out what we can do make the scope narrow.

cc @batpad @amishas157 @krishnanammala @chtnha @bkowshik

manoharuss commented 7 years ago

@geohacker I have spent some time yesterday looking at edits coming through Edited wikipedia and wikidata comparators. Here are my findings.

I have been seeing good changesets with these edits

We can reduce the scope of the comparators to nail down on noise by limiting the comparator to these edits:

geohacker commented 7 years ago

@manoharuss thank you, this is a great start. For each of the above cases of narrower scope, is it possible to find examples of how it breaks the map?

manoharuss commented 7 years ago

@geohacker

Examples:

  1. Deletion of wikidata tag http://osmcha.mapbox.com/43977595/
  2. Deletion of primary tag natural=water https://www.openstreetmap.org/changeset/44609658 <- edit that broke lake michigan from a new user who joined on december 23rd.
  3. name:en tag addition with chinese characters http://osmcha.mapbox.com/43837200/. Also change of neighborhood names to virus http://osmcha.mapbox.com/43817135/.
geohacker commented 7 years ago

@amishas157 are the above ^ points helpful in tightening our wikidata comparator? I think:

Should escalate and help us watch breaking changes.

@batpad @planemad thoughts?

cc @maning

amishas157 commented 7 years ago

@manoharuss Thanks for bringing out the observations.

@geohacker These points and examples look really helpful for cutting down the noise from this comparator. But I see that major of ^ harmful edits are done by new mappers. So we can restructure it to give result when :

cc @batpad @planemad

geohacker commented 7 years ago

@amishas157 - we don't have a reliable way of saying whether a mapper is new. Let's stick to a narrower scope of:

bkowshik commented 7 years ago

@geohacker the OpenStreetMap user details API gives the following:

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="OpenStreetMap server">
  <user id="5008734" display_name="Jerzie" account_created="2016-12-23T01:11:37Z">
    <description></description>
    <contributor-terms agreed="true"/>
    <roles>
    </roles>
    <changesets count="14"/>
    <traces count="0"/>
    <blocks>
      <received count="0" active="0"/>
    </blocks>
  </user>
</osm>

We could use some or all of the following properties to say if a user is a new mapper or not:

geohacker commented 7 years ago

@bkowshik my point is that we should be catching anyone deleting wikidata tags, as well as anyone changing the name tag more than 50%.

geohacker commented 7 years ago

@bkowshik per chat, name matching can be just a basic https://en.wikipedia.org/wiki/Levenshtein_distance#Example. We should:

bkowshik commented 7 years ago

Deployed all the ^ to production. Closing.

bsrinivasa commented 7 years ago

Flagging a bug here. (Not sure if this is the right place)

Received this #35020: osm-edit: edited a name tag for Lake Victoria and the chageset where no changes to name tag was detected in the recent times (since October 2016). Not sure why this came in through this comparator and if the name tags was modified in any of the members forming the part of the relation.

If that was the case, what would be the best way to verify this?

Note: The relation was flagged under broken relation and we had fixed this on 24 Jan at 6:56 PM and this PD came in thru osm-edit: edited a name tag at 6:58 PM.

cc: @manoharuss

bkowshik commented 7 years ago

I can't think of anything here @bsrinivasa.

bkowshik commented 7 years ago

With Wikipedia/Wikidata, we now make an additional check of the feature's name. Ex:


Compare function: name_matches_to_wikidata.js