`score` vs `match` - Githubissues

VladimirAlexiev commented 3 years ago

Recon spec

https://reconciliation-api.github.io/specs/latest/#reconciliation-query-responses includes two characteristics that determine the "quality" of a candidate:

score (numeric) and
match (boolean)

I think these rules would make sense:

match should be set for only 0 or 1 candidates
match should be set for the candidate with highest score, if its score is sufficiently high, and sufficiently higher than all other candidate scores

Do you agree? I can add them to the spec.

WD recon

https://github.com/wetneb/openrefine-wikibase/blob/master/docs/scoring.rst describes scoring
https://github.com/wetneb/openrefine-wikibase/blob/master/wdreconcile/engine.py#L338 computes match

@vasoto observed inconsistencies in these characteristics as returned by Wikidata recon. Can you confirm these observations, because looking at the code it seems this can't happen:

sometimes several candidates have match: true
sometimes a sole top candidate doesn't have match: true

OpenRefine

(Answered) OR has two functions to act upon these characteristics:

checkbox "Auto-match candidates with high confidence"
after-recon function "Match candidates with high confidence"

I couldn't find any documentation defining "confidence" through the two characteristics. @thadguidry and @wetneb, can you comment?

wetneb commented 3 years ago

This issue has a large overlap with https://github.com/reconciliation-api/specs/issues/51.

I couldn't find any documentation defining "confidence" through the two characteristics.

"Confidence" here just means "when the service says match: true".

VladimirAlexiev commented 3 years ago

sec "Recon spec" is a duplicate of #51.
You answered sec "OpenRefine".
I hope @vasoto can clarify his observation on WD Recon

thadguidry commented 3 years ago

@VladimirAlexiev Hi, I don't have much to say other than clarifying a few things.

match should be set for the candidate with highest score, if its score is sufficiently high, and sufficiently higher than all other candidate scores

Let's describe further what you mean by "set" and the handling that surrounds that on the client side...

Irregardless if the Recon service has scored 1 candidate very high, and selected it as a match. the client ultimately gets to make the final choice by agreeing or disagreeing with the match candidate returned by the Recon service, and the client can further expose a UI feature for match and handle as it wants. How a client handles a match condition is always up to it. Some clients will want to auto-select the match, and other clients will simply want to push up to the top or highlight or tag a match or some other handling, depending on the trust of a particular Recon service the client places on it and this should ideally be preference choices for users in a client.

@wetneb We might want to actually say some of my above paragraph in the 4.2 section of the Reconciliation Query Responses. Hmm, probably within here specifically? https://github.com/reconciliation-api/specs/blob/fc4c152ced1857b05d3b074377757efdd96f7061/latest/index.html#L454

But in general, I think we want section 4. Reconciliation Queries to make it a bit more clear on how clients might use score, features, and match but at the same time, strongly state that it's entirely "open" on how clients use them and incorporate or present UI's/Features to users. Some of that is suggested, but linked to references that are behind paywalls. I think we could probably add a 4.5 section on Client Uses, or not and simply rename 4.4 and put more into it, or keep things entirely separate from the spec and instead draft a lightweight doc on Client Examples with sections and link out to it from within the spec? I dunno. On the one hand, we are simply spec'ing out the Recon API, regardless of client usage...but then lack some big picture help for client developers by only provided paywalled references and occasional hints in the spec on how things might be used.

VladimirAlexiev commented 3 years ago

The client might... and the server could... Sounds like a conundrum on top of an enigma.

It's fine to keep it open, but at the same time we should describe what the canonical client (OR) does, and what a reasonable server should do.

BTW, what does OR do if there are several candidates with match=true and "auto-match" is selected?

thadguidry commented 3 years ago

I think we want to avoid adding to the official spec what any 1 particular client does, like OR. That's why I suggest a completely different document for describing OR or other example client handling and maybe just link out to it throughout the spec.

match should perhaps say: Optionally, a boolean matching decision, which indicates whether the service considers this candidate good enough to be chosen as a correct match. But that's my personal feeling (not sending things when not needed). I guess match could always be provided as an attribute, but then empty? I'm remiss.

BTW, what does OR do if there are several candidates with match=true and "auto-match" is selected?

Not sure, dive into the code? I'm also not sure how that might impact the Pool of ReconCandidates https://github.com/OpenRefine/OpenRefine/blob/5639f1b2f17303b03026629d763dcb6fef98550b/main/src/com/google/refine/util/Pool.java#L73 These are @wetneb questions for sure.

wetneb commented 3 years ago

It's fine to keep it open, but at the same time we should describe what the canonical client (OR) does, and what a reasonable server should do.

I am probably quite biased here but I'd prefer to avoid referring to OR too much. The more we can abstract ourselves away from it, the more likely we are to come up with improvements that make sense for other clients too.

BTW, what does OR do if there are several candidates with match=true and "auto-match" is selected?

I don't know either, I would guess that it takes the first (highest-ranking) candidate with match=true, but I haven't checked.

reconciliation-api / specs

`score` vs `match` #58

Recon spec

WD recon

OpenRefine