Closed ben-norton closed 1 year ago
If bounding boxes (#219) were provided a model and the scientificName by a human, then I guess you could express both in classifiedBy
, but I typically favour to retain the classifiedBy
who provided the highest classificationConfidence
.
What information would you provide in a classificationMethod
and what use cases would it solve?
I think the same logic as proposed in #225 should also apply to classificationMethod
e.g. machine | human
Note that this field has a controlled vocab. Do we drop that then? Or do we recommend to only populate the latest classification (no pipes). Or is it better to have 2 observation (with diff classificationTimestamp)?
You are right, then we would have to drop a controlled vocabulary, which is maybe not that terribly bad idea in this case (until we come with a better solution). Having 2 observations (or more) is an option already (btw very useful for testing AI models) but then we still do not know if they were made independently or if they were "chained" (machine -> human).
Wouldn’t you be able to defer chained from the classificationTimestamp in 2 obs? I’m a bit reluctant to throw away a vocab 😊
Not really, as 2 obs having different timestamps can still be classified independently and not "chained" together. By chaining I mean e.g. the following case: human expert verifies machine classification. But maybe cases like that are to specific for a data exchange standard?
But maybe cases like that are too specific for a data exchange standard?
Imo yes. I would suggest:
human
or machine
pointing to the latest classification. Basically allows users to filter on classifications not verified by a humanMegaDetector V5 | Jakub Bubnicki
, in the order these were classified.@kbubnicki would that be ok for you?
The solution is a bit anthropocentric i.e. assuming that humans always perform better than machines ;)
@kbubnicki would that be ok for you?
Yes, I think this is a good compromise!
I suggest adding something to the definition for "classificationMethod" to point to the connection with "classifiedBy", and vice versa.
I was scanning through the terms during the November webinar to see how to list which AI model was used to determine a species, and was surprised to see that classificationMethod was simply an enum of human or machine. I stopped right there and went to the Issues to find a discussion about that, before I even realized that there was a following classifiedBy term. I admit this was lazy reading on my part, but I can imagine other users missing the connection as well.
See https://github.com/tdwg/camtrap-dp/issues/225#issuecomment-1420784370, we won't allow multiple (| separated) values for the classification terms.
@MikeTrizna the definition for classificationMethod
has been updated from:
Classification method.
To:
Method (most recently) used to classify the observation.
I'd prefer not to reference classifiedBy
in that definition, because it opens the door to having to reference the other classification terms too... as well as cross-referencing many other related terms in their definitions. 😅
I strongly support the inclusion of a classificationMethod term. Use cases are more complicated than a simple human or machine. A growing number of observations are being produced by a combination of human and machine. Computer vision models that filter out blanks perform substantially better than animal classifiers, especially at the global or continental level. Models that place bounding boxes around objects of interest perform equally well. So you have a multi-step process for an observation that includes both human and machines. If both values are true, then which should be selected?