Make scores of reconciliation candidates optional

reconciliation-api / specs

Specifications of the reconciliation API

https://reconciliation-api.github.io/specs/draft/

31 stars 9 forks source link

Make scores of reconciliation candidates optional #127

Closed wetneb closed 12 months ago

wetneb commented 1 year ago

During last month's meeting, we had an interesting discussion (with @paulgirard and @fsteeg) about the different strategies we can adopt around scoring. Broadly speaking, there are two approaches:

adding more functionalities to the API so that clients can specify more precisely how the server should score the candidates (for instance how dates should be matched, which weights should be given to each properties, how to score multi-valued properties, how to match strings together, how to do type matching, and so on)
exposing more information in the reconciliation responses so that the client can do a (re-)scoring on its own afterwards, to adjust the scoring to its needs. This means exposing some matching features, maybe property values, and so on.

Given that there are already some services which basically give up on returning a meaningful score for its reconciliation candidates (for instance by returning a constant score, or by returning one that is inversely proportional to the position of the candidate in the list), this suggests that we could make the score field of reconciliation candidates optional.

This would arguably be a cleaner option for those existing services. It would also give a clearer message to spec readers about this intention to make local re-scoring a more viable option and have less of a reliance on this server-side scoring.

tfmorris commented 1 year ago

As I've stated before, I'm a strong believer in the first approach. The search service has much more information available and already needs to filter and rerank candidates before it can return anything to the caller. Any candidates which don't make the cut to be returned are, by definition, not available for re-ranking by the caller.

I'm on the fence about making the match score optional. If it's really made up, then it doesn't have value, but even a high/med/low score of 100/75/50 provides useful information to the user. Also, the score has an impact on the automatch behavior in OpenRefine.

fsteeg commented 1 year ago

Also, the score has an impact on the automatch behavior in OpenRefine.

Technically, that's probably true for most services (i.e. the service uses the score to decide if an entity is a match), but on the API level these are independent. Services could set a candidate's match field (based on any internal criteria) without returning a score.

even a high/med/low score of 100/75/50 provides useful information to the user

I think this relates to the context from #128 (and match_qualifier from #131 for the request side):

For cases where scores are not relevant, like geo containment search or matching dates with EDTF support, a candidate marked as a match could contain context like X schema:containsPlace Y or X and Y match: EDTF:Level-0 to show to users when reviewing candidates.