Intelligent interface between consumers and comparators

bkowshik commented 7 years ago

@amishas157 and me braistormed on an interface between the comparators. With this setup we hope to:

Answer which compare functions shoule be invoked when and why better rather than invoke all comparators all the time.
Reduce a significant amount of noise from common modifications like to name and Wikidata like fields.
Improve accuracy of detection for harmful modification to name and Wikidata like fields as these comparators are focussed and tuned only on these tags.

Scenario

User added a Wikidata tag to Osteria Francescana.
http://osmcha.mapbox.com/47061983/
One thing to keep in mind is nothing else about this feature got modified except for the addition of the Wikidata tag.

Current setup

Since this restaurant is part of our list of landmarks on osm, the osm-landmarks comparators labels this feature/changeset for manual review on osmcha.
The important bit is the osm-landmarks is not the best comparators to answer whether the addition of the Wikidata tag was a :thumbsup: or a :thumbsdown:
An osmcha user, @srividyacb in this case reviews the changeset and reviews the changeset as a :thumbsup: as things are great!

Proposed setup

There exists a couple of tag focussed comparators that have all the knowledge about whether additions, modifications or deletions of those specific tags are good or problematic.
Ex: A general Wikidata comparator exists which given a feature with a Wikidata can tell if that is good or harmful. Something things that it could be checking are:
- Name of the feature on OpenStreetMap match the name on Wikidata
- Location of feature on OpenStreetMap is pretty close to the location on Wikidata
- Etc
Now, for the usecase ^, since only the Wikidata tag got modified and nothing else, the Intelligent interface will call the Wikidata comparator first.
If the Wikidata from all the knowledge it has about what a good Wikidata tag for a feature looks like says this is a :thumbsup:, then no other compare function is invoked (osm-landmarks compare function in this case will not be invoked).
The idea here is, since this is just a Wikidata modification, the Wikidata comparator should be the best judge of things.
But, on the other hand, if the Wikidata comparator says a :thumbsdown:, then all other compare functions are invoked just like the current setup.

intelligent interface - osm-compare - validation

Would love to hear all your thoughts. cc: @geohacker @batpad

batpad commented 7 years ago

@bkowshik in this scenario, why can we not just have a utility function, that is something like isOnlyWikidataChange -- and then the osm-landmarks function checks, and if it is onlyWikiDataChange, stops further processing?

I'm a bit wary to build up these complex interwebs of relationships between compare function where each one needs to keep a track of its dependents and dependencies and I want to see if we can continue keeping compare functions single and atomic. I would imagine having a lot of small utility functions that are responsible for making these kinds of assertions about a change, and then the compare function be responsible for invoking them and stopping further processing if the change type matches certain conditions.

I think the other option would be something like:

A set of pre-process functions that adds some sort of tags to the edit to represent some attributes about the change - wikidataEdit, newFeature, deletedFeature, etc.
Then, the compare functions define what kinds of changes they are interested or not interested in, in a declarative fashion, so I'd imagine the layout to be something like:

function my CompareFunc(newVersion, oldVersion) {

)

myCompareFunc.exclude = [
  'wikidataEdit',
  'deletedFeature
];

and then the code that is responsible for running comparators checks the tags added to the feature against these properties defined by the function and decides whether to run it or not.

Not sure if there's another kind of architecture you have in mind.

I still prefer the approach of going with small utility functions and making it the responsibility of the comparator code to check whether this is something they are interested in or not by calling the utility functions.

But I can see the value in consolidating to an architecture where edits are "tagged" once as they come in, and there's some high level code that uses those "tags" to decide whether to call certain comparators or not. Happy to discuss more!

bkowshik commented 7 years ago

why can we not just have a utility function, that is something like isOnlyWikidataChange -- and then the osm-landmarks function checks, and if it is onlyWikiDataChange, stops further processing?

Let's take a scenario where the Wikidata tags of Eiffel Tower and Osteria Francescana are wrongly modified and it was just a Wikidata data change only. We want the osm-landmarks comparator to do the additional checks that it can do to to add more context and priority for the edits. In this case, Eiffel Tower being one of the wonders of the world could potentially be a more important feature to :eyes: before the restaurant. We won't get this additional context if osm-landmarks stops further processing. But, when the Wikidata changese are actually right, we don't want any additional context from osm-landmarks since this was just a Wikidata change.

batpad commented 7 years ago

Ok, so here's my proposal:

Have a suite of utility functions that are responsible for making these inferences about an edit, like:
- isImportantLandmark
- isWikidataEdit
- isNewMapper
We run all the utility functions on an edit before running the comparators, so we need to process them only once for every edit. I would do this by just using function memoization, can explain that in detail separately.
Each comparator can then check against these utility functions and decide what "results" it wants to output.

I think my initial thinking here was that the compare functions themselves would be responsible only for making these small inferences, like isNewMapper or isLandmark, etc. - and then it would be the responsibility of a module like pushToOsmcha to decide what should be regarded as "suspicious" and pushed as a suspicion (i.e. it could conceivably combine outputs of different detectors, etc.)

I'm definitely seeing a need to resolve the current problem we're facing - we essentially want to separate the task of "classifying" an edit, so-to-speak, based on the type of edit, or type of feature that was edited, or whatever -- and the task of "casting suspicions" about that edit, which very often is by using a combination of these different attributes that have been classified.

^ does all this make sense @bkowshik @ajithranka @amishas157 ? Let's voice a bit to make sure we're on the same page.

batpad commented 7 years ago

But, when the Wikidata changese are actually right, we don't want any additional context from osm-landmarks since this was just a Wikidata change.

@bkowshik could you expand more on "but, when the wikidata changes are actually right?" - so we'd need one function that checks if the edit was only to a wikidata tag -- and then another function that does things like query the wikidata APIs to try and figure out whether the change seemed correct?

bkowshik commented 7 years ago

Per voice with @batpad @amishas157 @ajithranka

There are use-cases when we want to just label features so that consumers can slice and dice through the labels to find interesting stuff. Ex: On osmcha, users can filter large lakes touched by new users. Then, there is the other case when we want to label features as either harmful or not harmful. So, how about we breakup up compare functions into two groups:

Taggers
Flaggers

Taggers label features, Ex:

Feature is an osm-landmark
Feature has a wikidata tag
Feature was modified by a new user

Flaggers use the taggers to label features as harmful or not harmful. Ex:

Large lakes: Removal of Wikidata tag from an osm-landmark is suspscious
Water bodies overlapping over buildings is suspscious

@geohacker

batpad commented 7 years ago

To expand a bit:

Taggers and Flaggers have the same interface - i.e. accept newVersion and oldVersion, so this is mostly a change of how we organize, and think about these things.
This will also allow us to treat taggers a bit differently on the consumer end and also allow us to implement a caching / memoization strategy for taggers that maybe called across multiple flaggers for the same edit.

I hope this eases a lot of confusion around compare functions and whether they need to be "broad" or "narrowly scoped" and what aspects we should work on to reduce noise, etc. Taggers should be responsible to be pretty dumb classifiers - a lot of them will be simple booleans, but could also return values. Flaggers would simply be a bunch of if statements combining outputs of various flaggers to decide whether to treat an edit as suspicious or not.

mapbox / osm-compare