mapbox / gabbar

Guarding OpenStreetMap from harmful edits using machine learning
MIT License
19 stars 7 forks source link

Bot to catch simple invalid capitalization on OpenStreetMap #14

Closed bkowshik closed 7 years ago

bkowshik commented 7 years ago

From @planemad's post here:

Validation is a good angle to have some bots running to catch simple issues like a invalid capitalization in a tag like Highway=residential


I ran a tile-reduce script looking for invalid capitalization in the 26 primary tags below:

aerialway, aeroway, amenity, barrier, boundary, building, craft, emergency, geological, highway, historic, landuse, leisure, man_made, military, natural, office, place, power, public_transport, railway, route, shop, sport, tourism, waterway

I eye-balled a few from the list and the results were true positives. Some of the invalid capitalization's were: Building, Highway, etc.

Ex: The feature way!455096754 has a invalid Building tag. So, as soon the changeset 43871967 was created a bot keeping an :eye: on the stream, corrects it to a highway and leaves a changeset discussion informing the user about the same along with some documentation links and the corrected changeset ID.

screen shot 2017-03-08 at 11 19 13 pm

Invalid capitalization do happen often on OpenStreetMap and are corrected by other community members. Ex: For node/426859638, the Highway was corrected to highway after a month.

screen shot 2017-03-08 at 11 32 07 pm

I love the idea letting the user who created the invalid capitalization know about the simple mistake and make an appropriate change automatically. @planemad, what are the next actions you are seeing to make this a real thing on OpenStreetMap?

planemad commented 7 years ago

This is amazing! It would be great if we can use this to set an example of how to create and operate simple and productive bots that can help mappers and drive quality in OSM. A list of existing OSM bots are here https://wiki.openstreetmap.org/wiki/Bot but its unclear what the approval process is or how one goes about running them.

How about sharing this idea on the OSM talk mailing list and get some guidance on what others feels is the best way to move forward.

bkowshik commented 7 years ago

Thank you @planemad 😃

There are a total of 9 bots listed on the Wiki with 3 of them labelled as Active. A majority of bots are very localized in nature. Ex: This bots repairs street names in Colombia, adds is_in tags to places in Germany, etc.

Details about bots listed on the Wiki sorted by the last changeset date:

Bot name Number of changesets Last changeset date
SearchAroundBot 74 2017-03-05
czechreg 18,886 2016-12-30
xybot 7,324 2016-03-20
botika 4,729 2013-05-26
Wall·E 1,727 2015-08-23
General Dreedle 188 2013-01-12
MS BOT 18 2012-11-01
BugBuster 33,726 2012-10-07
is_in-bot 6 2008-11-04

This seems like a two-step process:

  1. Setup bot to continuously monitor new changes and fix invalid capitalization.
  2. Run a backfill to fix existing invalid capitalization.

Documentation

Questions

NOTE: The following are some questions that came to mind

Validation tool vs Bot

A majority of tools in the OpenStreetMap ecosystem detect issues and flag them for manual review. Ex: http://keepright.at/

misspelled tags Tags that are used very seldom and almost look like very common tags (only one character difference) are reported as a warning.

screen shot 2017-03-09 at 11 39 38 am

Another option is to detect these issues through a script and upload them to a tool like To-Fix for manual review like the misspelled-tags task.

screen shot 2017-03-09 at 1 08 54 pm

The advantage with having a bot is that things happen automatically without requiring time from uesrs and definite problems are be fixed in near-realtime. Ex: I tried adding natural=water and Natural=water to see the difference on the OpenStreetMap dev server and the following is what I found, the rendering is totally different.

natural-vs-natural

What more could we automate

I agree that checking and fixing invalid capitalization is a good first step to take for the bot. But, I am curious to know what are the other possibilities for automation so that we make a wise decision on the amount of time and energy we spend developing and maintaining a bot.

Notes

Bots have:

Next actions

bkowshik commented 7 years ago

OSM diary post: http://www.openstreetmap.org/user/bkowshik/diary/40627

bkowshik commented 7 years ago

We are making good progress with Gabbar. Let's come revisit the bot idea in a couple of weeks.