Open missinglink opened 3 years ago
Coming back to this now after thinking it over...
I think it's good to merge, the concept that the parent names at the same layer as the feature can be used for deduplication is logical and unlikely to cause error.
The issue description was missing the concrete example which was included in the tests:
Given the feature
geonames:region:2950157
("Land Berlin", en: "State of Berlin") along with another feature at the same levelwhosonfirst:region:85682499
("Berlin") in the results.We can use the
parent.region = ["Berlin"]
property ofgeonames:region:2950157
to establish that it is equivalent towhosonfirst:region:85682499
The parent.region_id = [85682499]
property of geonames:region:2950157
further indicates that the two are duplicate, so we could consider only using the IDs, I'm open to both methods.
In some cases the parent hierarchy contains tokens which are relevant for deduplication.
For example if we have a
geonames
record onlayer=locality
with thename.default="Land Berlin"
and alsoparent.locality=["Berlin"]
.In this case we can use the parent field provided by the PIP service to assist in the deduplication process.