Open missinglink opened 3 years ago
Agreed, this would definitely be helpful. I think the list of "equivalent" layers has slowly grown well beyond what's reasonable.
I know there are lots of inter-layer deduplication cases to consider, but hopefully we could considerably reduce the scope of inter-layer deduping with a change like you proposed. The "widest" required deduping I can recall is deduping Luxembourg across locality, localadmin, region, and country, which is considerably more constrained than what we technically now allow for.
In terms of implementation, I'd just request we don't actually have a literal matrix of 0
's and 1
's (hopefully that doesn't need to be said 😆).
Maybe we keep things simple with an array of layer pairs that are equivalent, so something like this:
const equivalent_layers = [
[ 'neighbourhood', 'locality'],
[ 'locality', 'localadmin' ],
// etc
];
We'd have to ensure our code handles things in either order (A locality is equivalent to a localadmin, and a localadmin is equivalent to a locality), I imagine, unless there's a reason we don't want that (I can't think of one).
The existing
helper/diffPlaces::isLayerDifferent()
function provides the ability to deduplicate between disparate layers in certain conditions.It provides an Array called
$layerPreferences
which indicates which layers can be considered synonymous for deduplication purposes.I think this this array may be causing a lot of the confusion when working with the deduplication code and potentially creating errors, the reason for this is that the list contains 9 terms and considers them all to synonymous with each other, so
neighbourhood
andempire
can be deduped, for example, which seems wrong to me.The
dedupe.js
logic is pretty complicated, once it finds a set of 'potential matches' then it selects a 'master' record and, considering the others as dupes, adds them to a skip list which excludes them from results.Since the
layerPreferences
matching is quite broad, many places are added to the skip list which shouldn't really be there!I had a quick play around with the code but didn't settle on a solution yet, I think the main improvement would be to have some sort of layer equivalency matrix which allows us to be more specific about exactly which layers on the
layerPreferences
lists can be combined which which other ones.Eg: