Open missinglink opened 3 years ago
a bit more info on the code in this PR, the new field is called concordance
and is an object
type mapping with string keys (so basically it's the same sort of structure as an Object
in javascript).
I think this would be preferable to something like how we do category
where it's more analogous to a javascript Array
.
The dynamic_templates
thing is because the object keys are generated dynamically and would (by default) create fields with the default mapping, we instead define a specific mapping which sets the type to keyword
.
Yeah, this makes a lot of sense, and I really like the idea of querying for concordances on the place endpoint. What do you think would be a good query format for that?
My memory is a bit hazy, but I think we should be able to query on those keyword
fields easily, right? We don't need to do anything else: aggregations, keywords, or regular full text search.
Yeah exactly, so it's set to keyword
which means there's no analysis (it's just full token exact matching), so no synonyms or anything like that are applied.
It's currently set to doc_values=false
because it doesn't make sense to run aggregations on unique values anyway.
So yeah, basically if you write a match
query and it matches exactly its a hit, else not, nothing remotely fancy going on.
What do you think would be a good query format for that?
Good question, so you could just /v1/place?ids=gn:id@2222
although I'm not a big fan of mixing and matching our GID
values with others, the ?id
param isn't ?gid
so 🤷♂️
Otherwise we could be more explicit and say something like /v1/place?concordance=gn:id@2222,wk:page@Germany
TBH I haven't given that enough thought, neither of those sounds very nice.
[edit] due to using an object
type mapping we have key->value
pairs, so it would require a convention (such as the @
in the example above) which delimited K from V.
I agree that reusing the ids
parameter is not ideal.
A concordance=
param would work, but like you described we would have to handle both the "field" and "value" side of the concordance query. I also think we'd really want to put some effort into making the concordance names a bit more friendly. gn:id
and wk:page
(and all the others as they are stored in WOF) are pretty cryptic if you don't know what they stand for.
I guess all this would complicate the /v1/place
endpoint a bit, since it would support queries by ids
or concordance
(but not both?). That might still be worth it.
:+1: we should not use ids
for concordance.
A feature like this would be very interesting, especially with the OSM data :+1:
ping! @pelias/contributors this PR is a discussion with code attached 🚀
this year has seen some work around recording and exposing 'concordances' (the WOF term for foreign key references). these concordances are valuable to organisations who also use the foreign ID system and would like an easy way of joining Pelias GIDs with other datasets.
the existing implementation works great, looking at Germany in WOF you can see it returns a treasure trove of useful concordances in the
addendum
.one problem we've identified with using the addendum is that it's (by definition) only semi-structured and comes without many guarantees of correctness or availability.
what would be better is if concordances were more structured and formalised within Pelias so that they could be considered a public API which integrators could rely upon for a 'crosswalk' between datasets.
this PR would potentially open the door for that, it could be combined with a PR to
pelias/model
to perform the validation. the validation rules would need a little thought, but things like casing, delimiters, abbreviations, collisions, etc would need to be considered.there is also a secondary concern (beyond simply displaying the information), which is that users may also wish to search on these values, this is certainly never going to be possible with the addendum.
introducing a new parameter would need a bit more discussion but what comes to mind is the
/v1/place
endpoint could supportconcordance
lookup, either via the existing?ids=
param or a new one.thoughts?