Closed peterdesmet closed 1 year ago
@dshorthouse
classifiedBy
: human or AI)'s own confidence in the identification. AI can specify this pretty accurately, for humans the percentage is more arbitrary, but it's generally not set by the user. E.g. in the management software we have, we provide the AI value as is, but for humans we use 0.5
if they marked the boolean field uncertain
and 1
if the classification was verified by a human.identificationVerificationStatus
, if only that one is a percentage and the other is categorical, which are different concepts.classification
: it is a loaded term indeed and one mostly coming from the machine learning world, but with other connotations in biodiversity. Feedback welcome on the term in https://github.com/tdwg/camtrap-dp/issues/164Originally posted by @peterdesmet in https://github.com/tdwg/camtrap-dp/issues/169#issuecomment-913750488
- It is the determiner (identified in
classifiedBy
: human or AI)'s own confidence in the identification. AI can specify this pretty accurately, for humans the percentage is more arbitrary, but it's generally not set by the user. E.g. in the management software we have, we provide the AI value as is, but for humans we use0.5
if they marked the boolean fielduncertain
and1
if the classification was verified by a human.
Aha! There could be two activities at play here, expressed in the same field. The first is if the original determiner self-declares their uncertainty and the second is if someone else indicates their agreement (or dissent?) with the determiner. I'm assuming verification does not always mean agreement. What if there are several of the second kind, much like in iNaturalist? This does indeed get us into messy annotation space.
Regardless of whether or not it's an AI that made the original determination, these are distinct actions that attempt to convey trustworthiness. Should these be split such that downstream users are better informed when deciding what to toss or what to use? The question is if all this is merely noise or does it make the data more transparent and powerful? What does a user of the data expect to do when presented with any value in classificationConfidence
?
Originally posted by @dshorthouse in https://github.com/tdwg/camtrap-dp/issues/169#issuecomment-913771093
the second is if someone else indicates their agreement (or dissent?) with the determiner.
The field is intended for self-declaration, not to judge the identification of others. In the case of multiple (conflicting) identifications, only one would be exported to Camtrap DP, typically the one with the highest confidence (as defined by the system), e.g. AI < volunteer < expert validation.
I see your point though, maybe a boolean certain
vs uncertain
conveys clearer information to the user on what to use (but note that many of the observations will have an empty classificationConfidence
). It is then up to the data publisher to decide what AI confidence can be considered certain. It is a loss of information, but he/she can probably judge better than the data user.
Indicating AI verses Human is really useful so it would seem that identificationVerificationStatus
is a good place for that. Not to put the cat amongst the pigeons though, there is also identificationQualifier
:
https://dwc.tdwg.org/list/#dwc_identificationQualifier
I bet if data was pulled from this field you would find both things like aff. and cf. but also possible, maybe, ?, uncertain, certain etc. etc.
Hi @rondlg, human vs machine is useful, which is why we have a dedicated field for it: classification_method.
Note that we are looking for "equivalents" in Darwin Core, i.e. this Camtrap DP field is the same concept as this field in Darwin Core. Although I think that identificationVerificationStatus
and identificationQualifier
are reasonable fields to map data to, I wouldn't consider them the same concepts as the Camtrap DP fields. Would you agree?
Yup (my bad) I see classification method
now.
I'd actully say that that identificationVerificationStatus
and identificationQualifier
are the same concept - but not a hill I need to die on ;)
Might be something here: https://doi.org/10.1093/database/bav043
I work on AI methods so "classification confidence" is a salient issue for me. I agree that self-declared confidence is the main point. Allow me to respond to this discussion thread, and to make a minor edit suggestion.
"What does a user of the data expect to do when presented with any value in classificationConfidence?"
IMHO there are two main use-cases:
Both of these are covered well if confidences are expressed as probabilities. I appreciate that it's hard to get true probabilities from manual (human) annotations, and that there are other approaches (e.g. categorical or rank-based).
In my humblest opinion: (a) manual annotations should typically not come with confidences expressed as probabilities (unless they've actually been estimated by some procedure), but merely with attribution of which person/project/institution did the annotation; (b) AI-derived annotations should be strongly encouraged to include confidences expressed as probabilities.
All that said, the current text under https://tdwg.github.io/camtrap-dp/data/#observations.classificationconfidence is good enough. However, may I propose to change "Provide an approximate value for human classifications" to "For human classifications, omit this field (in CSV, an empty string) or use an approximate value if available".
This has been discussed and addressed in https://github.com/tdwg/camtrap-dp/pull/208. The classificationProbablity
is now defined as:
Degree of certainty of the (most recent) classification. Expressed as a probability, with 1 being maximum certainty. Omit or provide an approximate probability for human classifications.
I think this addresses the points raised here:
classifiedBy
and classificationMethod
dwc:identificationVerificationStatus
or dwc:identificationQualifier
since those terms have different meanings, even if we take into account narrow/broader scopes
Not quite sure what to make of your
classificationConfidence
. Is the intent to:I'm assuming it's more like the first or second one & not the last one. That last one illustrates that the use of the word 'classification' is probably best avoided because it's a loaded term. Or, does your use of classification mean categorization as in, "This is a female cougar and cubs vs. this is a lone male"?
Either way, I see your term is a probability whereas
identificationVerificationStatus
is meant to be categorical. The first two items above forclassificationConfidence
illustrate that it's a slippery concept. I suppose you'd have to put yourself in the shoes of a user. What would a confidence of 0.65 vs 0.6 mean to a user? Who establishes the criteria for how to construct a value? Is it sample-based? If so, what happens to this probability if the data are pooled with other datasets? Is it an algorithm or a human that constructs it? Would there be any loss of information to a user of the data if instead ofclassificationConfidence
you usedidentificationVerificationStatus
with categorical data or a mere boolean (i.e. "Yep, a human looked at it." vs "Nope, a human has not looked at it.")?Originally posted by @dshorthouse in https://github.com/tdwg/camtrap-dp/issues/169#issuecomment-913692929