Closed geohacker closed 6 years ago
@geohacker I have spent some time yesterday looking at edits coming through Edited wikipedia and wikidata comparators. Here are my findings.
I have been seeing good changesets with these edits
Good geometric modifications such adding/deleting members in a relation, modifications to existing ways. Relation fixes which also have wikipedia tags. Valid geometric corrections from experienced users(addition of nodes).
Additions of Wikipedia tags to previously existing features.
Wikidata tag corrections of experienced users.
We can reduce the scope of the comparators to nail down on noise by limiting the comparator to these edits:
Name tag of wikipedia/wikidata features with deletions/modifications - when new name is completely different from the older version of the feature.
Primary tag deletions/modification. Ex: Amenity, landuse, highway tag removals.
Deletion of other feature tags of wikipedia/wikidata features.
Let us escalate if feature has both wikipedia and wikidata tag and is edited by a new user(<2months?) on above criteria.
@manoharuss thank you, this is a great start. For each of the above cases of narrower scope, is it possible to find examples of how it breaks the map?
@geohacker
Examples:
natural=water
https://www.openstreetmap.org/changeset/44609658 <- edit that broke lake michigan from a new user who joined on december 23rd.name:en
tag addition with chinese characters http://osmcha.mapbox.com/43837200/. Also change of neighborhood names to virus http://osmcha.mapbox.com/43817135/.@amishas157 are the above ^ points helpful in tightening our wikidata comparator? I think:
Should escalate and help us watch breaking changes.
@batpad @planemad thoughts?
cc @maning
@manoharuss Thanks for bringing out the observations.
@geohacker These points and examples look really helpful for cutting down the noise from this comparator. But I see that major of ^ harmful edits are done by new mappers. So we can restructure it to give result when :
cc @batpad @planemad
@amishas157 - we don't have a reliable way of saying whether a mapper is new. Let's stick to a narrower scope of:
@geohacker the OpenStreetMap user details API gives the following:
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="OpenStreetMap server">
<user id="5008734" display_name="Jerzie" account_created="2016-12-23T01:11:37Z">
<description></description>
<contributor-terms agreed="true"/>
<roles>
</roles>
<changesets count="14"/>
<traces count="0"/>
<blocks>
<received count="0" active="0"/>
</blocks>
</user>
</osm>
We could use some or all of the following properties to say if a user is a new mapper or not:
newVersion
and oldVersion
geojson's)@bkowshik my point is that we should be catching anyone deleting wikidata tags, as well as anyone changing the name tag more than 50%.
@bkowshik per chat, name matching can be just a basic https://en.wikipedia.org/wiki/Levenshtein_distance#Example. We should:
Deployed all the ^ to production. Closing.
Flagging a bug here. (Not sure if this is the right place)
Received this #35020: osm-edit: edited a name tag for Lake Victoria and the chageset where no changes to name tag was detected in the recent times (since October 2016). Not sure why this came in through this comparator and if the name tags was modified in any of the members forming the part of the relation.
If that was the case, what would be the best way to verify this?
Note: The relation was flagged under broken relation and we had fixed this on 24 Jan at 6:56 PM and this PD came in thru osm-edit: edited a name tag at 6:58 PM.
cc: @manoharuss
I can't think of anything here @bsrinivasa.
With Wikipedia/Wikidata, we now make an additional check of the feature's name. Ex:
Compare function: name_matches_to_wikidata.js
Let's nail scenarios that need escalation for edits to features that have wikipedia/wikidata tags. For example: https://osmcha.mapbox.com/44862344/
@manoharuss can you inspect and lay out what we can do make the scope narrow.
cc @batpad @amishas157 @krishnanammala @chtnha @bkowshik