tdwg / camtrap-dp

Camera Trap Data Package (Camtrap DP)
https://camtrap-dp.tdwg.org
MIT License
46 stars 5 forks source link

classifiedBy - AI Specification #225

Closed ben-norton closed 1 year ago

ben-norton commented 2 years ago

For observations classified by computer vision models, additional specifics are needed to enter information in the classifiedBy field. I think megadetector is an exception. It may be one of the only AI techniques that can be referred to by a single name. Most computer vision models are much more complex. At minimum, several parameters are needed to reference a computer vision model. The megadetector repo states the following regarding its purpose: "MegaDetector" - to detect animals, people, and vehicles in camera trap images. It does not identify animals to the species level, it just finds them." You would need more than just megadetector to make an observation below the kingdom level.

peterdesmet commented 2 years ago

The classifiedBy isn't limited to a single name. E.g. we use it to express Western Europe species model v1. It could even contain a link/doi to a paper describing the model. And I think it could be fine to separate multiple contributors, e.g. Western Europe species model v1 | Jim Casaer.

If we agree on that, I guess we could update the definition to clarify that.

kbubnicki commented 2 years ago

I think this (i.e. multiple names linked by a pipeline) sounds like a good workaround for the first release. Definitely needed as we have to know how given observation was finally generated (i.e. how many classification stages it passed). I would suggest to add in the definition that the order matters (!) e.g. first classified by Europe species model v1 then verified/improved by Jim Casaer.

kbubnicki commented 2 years ago

Then I think that classificationMethod should follow the same logic #224

peterdesmet commented 2 years ago

@kbubnicki just a suggestion: would you prefer that we split classifiedBy into classifiedBy (single value) and verifiedBy (single value). Or is that going to be too limiting (because two values maximum) and confusing with the other "classified" terms?

peterdesmet commented 1 year ago

@kbubnicki and I discussed this and we decided not allow multiple (| separated) values for the classification terms. It makes it harder to validate controlled values, formats etc. and doesn't allow comparison between classifications. I will indicate for all term name, that the value applies to the "most recent" action: