Open VladimirAlexiev opened 3 years ago
This issue has a large overlap with https://github.com/reconciliation-api/specs/issues/51.
I couldn't find any documentation defining "confidence" through the two characteristics.
"Confidence" here just means "when the service says match: true
".
@VladimirAlexiev Hi, I don't have much to say other than clarifying a few things.
match
should be set for the candidate with highestscore
, if its score is sufficiently high, and sufficiently higher than all other candidate scores
Let's describe further what you mean by "set" and the handling that surrounds that on the client side...
Irregardless if the Recon service has scored 1 candidate very high, and selected it as a match
. the client ultimately gets to make the final choice by agreeing or disagreeing with the match
candidate returned by the Recon service, and the client can further expose a UI feature for match
and handle as it wants. How a client handles a match
condition is always up to it. Some clients will want to auto-select the match
, and other clients will simply want to push up to the top or highlight or tag a match
or some other handling, depending on the trust of a particular Recon service the client places on it and this should ideally be preference choices for users in a client.
@wetneb We might want to actually say some of my above paragraph in the 4.2 section of the Reconciliation Query Responses. Hmm, probably within here specifically? https://github.com/reconciliation-api/specs/blob/fc4c152ced1857b05d3b074377757efdd96f7061/latest/index.html#L454
But in general, I think we want section 4. Reconciliation Queries to make it a bit more clear on how clients might use score
, features
, and match
but at the same time, strongly state that it's entirely "open" on how clients use them and incorporate or present UI's/Features to users. Some of that is suggested, but linked to references that are behind paywalls. I think we could probably add a 4.5 section on Client Uses, or not and simply rename 4.4 and put more into it, or keep things entirely separate from the spec and instead draft a lightweight doc on Client Examples with sections and link out to it from within the spec? I dunno. On the one hand, we are simply spec'ing out the Recon API, regardless of client usage...but then lack some big picture help for client developers by only provided paywalled references and occasional hints in the spec on how things might be used.
The client might... and the server could... Sounds like a conundrum on top of an enigma.
It's fine to keep it open, but at the same time we should describe what the canonical client (OR) does, and what a reasonable server should do.
BTW, what does OR do if there are several candidates with match=true and "auto-match" is selected?
I think we want to avoid adding to the official spec what any 1 particular client does, like OR. That's why I suggest a completely different document for describing OR or other example client handling and maybe just link out to it throughout the spec.
match
should perhaps say:
Optionally, a boolean matching decision, which indicates whether the service considers this candidate good enough to be chosen as a correct match.
But that's my personal feeling (not sending things when not needed). I guess match
could always be provided as an attribute, but then empty? I'm remiss.
BTW, what does OR do if there are several candidates with match=true and "auto-match" is selected?
Not sure, dive into the code? I'm also not sure how that might impact the Pool of ReconCandidates https://github.com/OpenRefine/OpenRefine/blob/5639f1b2f17303b03026629d763dcb6fef98550b/main/src/com/google/refine/util/Pool.java#L73 These are @wetneb questions for sure.
It's fine to keep it open, but at the same time we should describe what the canonical client (OR) does, and what a reasonable server should do.
I am probably quite biased here but I'd prefer to avoid referring to OR too much. The more we can abstract ourselves away from it, the more likely we are to come up with improvements that make sense for other clients too.
BTW, what does OR do if there are several candidates with match=true and "auto-match" is selected?
I don't know either, I would guess that it takes the first (highest-ranking) candidate with match=true, but I haven't checked.
Recon spec
https://reconciliation-api.github.io/specs/latest/#reconciliation-query-responses includes two characteristics that determine the "quality" of a candidate:
score
(numeric) andmatch
(boolean)I think these rules would make sense:
match
should be set for only 0 or 1 candidatesmatch
should be set for the candidate with highestscore
, if its score is sufficiently high, and sufficiently higher than all other candidate scoresDo you agree? I can add them to the spec.
WD recon
match
@vasoto observed inconsistencies in these characteristics as returned by Wikidata recon. Can you confirm these observations, because looking at the code it seems this can't happen:
match: true
match: true
OpenRefine
(Answered) OR has two functions to act upon these characteristics:
I couldn't find any documentation defining "confidence" through the two characteristics. @thadguidry and @wetneb, can you comment?