tdwg / camtrap-dp

Camera Trap Data Package (Camtrap DP)
https://camtrap-dp.tdwg.org
MIT License
44 stars 5 forks source link

Using `other` in baitUse and featureType + change baitUse to boolean #251

Closed peterdesmet closed 1 year ago

peterdesmet commented 2 years ago

Commented by @ben-norton in https://github.com/tdwg/camtrap-dp/issues/220#issue-1345721231:

The term 'other' should be removed or the parent term should be revised. One of the four main principles for creating controlled vocabularies is the elimination of ambiguity (ANSI/NISO Z39.19-2005). 'Other' is a commonly appended term to controlled vocabularies. The merit is clear. However, ambiguity in any form detracts from the primary functions of a controlled vocabulary (see ANSI/NISO Z39.19-2005). There are exceptions to the use of 'other', but I don't think this is one.

--

peterdesmet commented 2 years ago

Both baitUse and featureType list other in their controlled vocabulary. In the definition, both terms then say:

If other, more info can be provided in comments.

To me that is fine. There was an original baitUse or featureType that could not be mapped to the vocabulary, so other was chosen with more information in comments. Alternatively, those could be mapped to NULL, but I think we are losing information that way?

Feedback welcome

ben-norton commented 2 years ago

The drawback to that strategy is that when either value is other, the property is no longer machine-readable and therefore not programmatically accessible. It could be at the end of the comments, the beginning, inserted in the middle, or absent all together. A human might be able to parse it out from the comments, but again it's optional. Does the value 'other' provide a sufficient level of information where it can stand on its own (without additional information) and still be useful? I think for baitUse it does because bait inherently results in some level of detection bias. Plus, baitUse as a concept is fairly narrow. I think the opposite is true for featureType. The concept is broad and the introduction of bias is not as clear. I don't think mapping it to null is an acceptable alternative solution either. Null implies a false condition. A third solution. Drop the pursuit of a controlled vocabulary. Instead add a boolean field for both (baitUse true/false). Then a second field for description with a suggestion that the provider use a controlled vocabulary. That way you don't need to rely on comments, you capture the presence, and provide a mechanism for a description.

peterdesmet commented 2 years ago

@ben-norton I think your solution to make baitUse a boolean field is a very good one! I don’t know however if people analyse bait use (yes/no) or type of bait use? Can others (@jimcasaer @Tim-Hofmeester) provide feedback on this?

Tim-Hofmeester commented 2 years ago

I agree with the solution to have baitUse as boolean. In most cases that would be sufficient for the analysis. Especially if there is then an option to specify the bait type (preferentially as a controlled list, but could be an open text string too.

As for feature type, I think it would be very hard to standardize, but might be really important in terms of detection probabilities (at least based on my experience). However, here again, I think it would be better to have a list of several broad categories in which all feature types would fall, rather than having an open text field or a category such as 'other'. In most analyses, one would want to have this as a categorical variable, and then you don't want too many categories either, so often we end up grouping categories into several broader categories anyway.

peterdesmet commented 2 years ago

Thanks, I will change baitUse to a boolean and describe it as:

true if bait was used for the deployment. More information can be provided in tags or comments.

In tags, I'm adding the example:

forest edge | bait:food

That should cater to @Tim-Hofmeester's:

Especially if there is then an option to specify the bait type (preferentially as a controlled list, but could be an open text string too.

peterdesmet commented 2 years ago

@Tim-Hofmeester for featureType, we have currently adopted the Wildlife Insights categories:

none, road paved, road dirt, trail hiking, trail game, road underpass, road overpass, road bridge, culvert, burrow, nest site, carcass, water source, fruiting tree, other

Which at least is some attempt at standardization.

I still don't know if we should leave other in, or if that can be conflated with NULL (not provided).

peterdesmet commented 1 year ago

This is now implemented in v0.5: baitUse is a boolean, other is removed from featureType.