Open bkowshik opened 7 years ago
@bkowshik in this scenario, why can we not just have a utility function, that is something like isOnlyWikidataChange
-- and then the osm-landmarks
function checks, and if it is onlyWikiDataChange
, stops further processing?
I'm a bit wary to build up these complex interwebs of relationships between compare function where each one needs to keep a track of its dependents and dependencies and I want to see if we can continue keeping compare functions single and atomic. I would imagine having a lot of small utility functions that are responsible for making these kinds of assertions about a change, and then the compare function be responsible for invoking them and stopping further processing if the change type matches certain conditions.
I think the other option would be something like:
A set of pre-process functions that adds some sort of tags
to the edit to represent some attributes about the change - wikidataEdit
, newFeature
, deletedFeature
, etc.
Then, the compare functions define what kinds of changes they are interested or not interested in, in a declarative fashion, so I'd imagine the layout to be something like:
function my CompareFunc(newVersion, oldVersion) {
)
myCompareFunc.exclude = [
'wikidataEdit',
'deletedFeature
];
and then the code that is responsible for running comparators checks the tags added to the feature against these properties defined by the function and decides whether to run it or not.
Not sure if there's another kind of architecture you have in mind.
I still prefer the approach of going with small utility functions and making it the responsibility of the comparator code to check whether this is something they are interested in or not by calling the utility functions.
But I can see the value in consolidating to an architecture where edits are "tagged" once as they come in, and there's some high level code that uses those "tags" to decide whether to call certain comparators or not. Happy to discuss more!
why can we not just have a utility function, that is something like isOnlyWikidataChange -- and then the osm-landmarks function checks, and if it is onlyWikiDataChange, stops further processing?
Let's take a scenario where the Wikidata tags of Eiffel Tower and Osteria Francescana are wrongly modified and it was just a Wikidata data change only. We want the osm-landmarks comparator to do the additional checks that it can do to to add more context and priority for the edits. In this case, Eiffel Tower being one of the wonders of the world could potentially be a more important feature to :eyes: before the restaurant. We won't get this additional context if osm-landmarks stops further processing. But, when the Wikidata changese are actually right, we don't want any additional context from osm-landmarks since this was just a Wikidata change.
Ok, so here's my proposal:
Have a suite of utility functions that are responsible for making these inferences about an edit, like:
isImportantLandmark
isWikidataEdit
isNewMapper
We run all the utility functions on an edit before running the comparators, so we need to process them only once for every edit. I would do this by just using function memoization, can explain that in detail separately.
Each comparator can then check against these utility functions and decide what "results" it wants to output.
I think my initial thinking here was that the compare functions themselves would be responsible only for making these small inferences, like isNewMapper
or isLandmark
, etc. - and then it would be the responsibility of a module like pushToOsmcha
to decide what should be regarded as "suspicious" and pushed as a suspicion (i.e. it could conceivably combine outputs of different detectors, etc.)
I'm definitely seeing a need to resolve the current problem we're facing - we essentially want to separate the task of "classifying" an edit, so-to-speak, based on the type of edit, or type of feature that was edited, or whatever -- and the task of "casting suspicions" about that edit, which very often is by using a combination of these different attributes that have been classified.
^ does all this make sense @bkowshik @ajithranka @amishas157 ? Let's voice a bit to make sure we're on the same page.
But, when the Wikidata changese are actually right, we don't want any additional context from osm-landmarks since this was just a Wikidata change.
@bkowshik could you expand more on "but, when the wikidata changes are actually right?" - so we'd need one function that checks if the edit was only to a wikidata tag -- and then another function that does things like query the wikidata APIs to try and figure out whether the change seemed correct?
Per voice with @batpad @amishas157 @ajithranka
There are use-cases when we want to just label features so that consumers can slice and dice through the labels to find interesting stuff. Ex: On osmcha, users can filter large lakes touched by new users. Then, there is the other case when we want to label features as either harmful or not harmful. So, how about we breakup up compare functions into two groups:
@geohacker
To expand a bit:
Taggers
and Flaggers
have the same interface - i.e. accept newVersion
and oldVersion
, so this is mostly a change of how we organize, and think about these things.
This will also allow us to treat taggers
a bit differently on the consumer end and also allow us to implement a caching / memoization strategy for taggers
that maybe called across multiple flaggers
for the same edit.
I hope this eases a lot of confusion around compare functions and whether they need to be "broad" or "narrowly scoped" and what aspects we should work on to reduce noise, etc. Taggers
should be responsible to be pretty dumb classifiers - a lot of them will be simple booleans, but could also return values. Flaggers
would simply be a bunch of if
statements combining outputs of various flaggers
to decide whether to treat an edit as suspicious or not.
@amishas157 and me braistormed on an interface between the comparators. With this setup we hope to:
Scenario
Osteria Francescana
.Current setup
osm-landmarks
comparators labels this feature/changeset for manual review on osmcha.Proposed setup
Would love to hear all your thoughts. cc: @geohacker @batpad