tdwg / dwc-for-biologging

Darwin Core recommendations for biologging data
Creative Commons Attribution 4.0 International
13 stars 3 forks source link

Set occurrenceStatus to "doubtful" for outliers #1

Open peterdesmet opened 6 years ago

peterdesmet commented 6 years ago

At the workshop we proposed to have a quality flag (an "occurrenceVerificationFlag") in occurrenceRemarks, based on terminology in Andre Steckenreuter et al. (2016).

I suggest that for those records with a low quality flag (i.e. we're pretty sure it's a ghost detection or an outlier) to also set occurrenceStatus to doubtful. GBIF/OBIS is already using that field to parse occurrences that should not be harvested/put on maps, which means they don't have to sift through occurrenceRemarks and understand the vocab used there to understand how doubtful a record is.

The definition for doubtful in the occurrence status controlled vocabulary is:

The taxon is scored as being present in the area but there is some doubt over the evidence. The doubt may be of different kinds including taxonomic or geographic imprecision in the records.

If there is agreement to do this, I suggest to make occurrenceStatus a required field (that should make @albenson-usgs happy 😄), with default value present.

Side note: My initial thought was to use "absent", but the definition for that is that there is evidence that the taxon is not there. That is not what a ghost record or outlier is.

albenson-usgs commented 6 years ago

Yes, this sounds like a good working solution to me. @jdpye and @PeggyNewman would you agree?

jdpye commented 6 years ago

Agree with classifying our detections this way. I went looking for a 'probable' field in the nomenclature and it seems to me that the field appears to be saying something about whether the species can generally be found in a location, not referring to individual sightings or detections. The question is, how are people actually using it. In applying it to an individual detection, it doesn't look like we'd be creating too much confusion, and 'doubtful' is a nice way to classify our faith in a phantom detection. Not impossible, but varying degrees of unlikely.

So long as we can be sure that someone stumbling across this data won't be confused by our use of this field, it lines up fairly well for me with how we want to classify our QCed detections.

peggynewman commented 6 years ago

It looks like 'excluded' is an option in that vocabulary too. I remember somebody suggesting that most studies clean out non-animal detections (eg. by removing range test detections). Maybe this is a practical value to include if for some reason they are left in the data?

peterdesmet commented 6 years ago

I have thought about “excluded” as well, but that is (just like absent) possitively stating that a record is not there, but it has (e.g. in gray literature) reported there as present in the past. I don’t think that fits our definition here.

Note: it doesn’t mean: to be excluded from analysis/dataset.

Antonarctica commented 6 years ago

Hi Would this stil be used in combination with a quality flag (an "occurrenceVerificationFlag") in occurrenceRemarks, based on terminology in Andre Steckenreuter et al. (2016)? Still using that would make sense to me.

For the proposed use of "doubtful" I think it is a good working solution.

I do get the sense from the definition that with some GPS tags on birds/mammals you might be pushing the definition of Present a bit, but happy to accept that as well.

phwalsh commented 6 years ago

I agree use of occurenceStatus is a good idea.

@Antonarctica, my understanding is that both would be used. occurenceStatus for existing use case by GBIF/OBIS and occurrenceVerificationFlag to include Andre et al QC flag (note this is only used for acoustic data).

jdpye commented 6 years ago

Agree we'd set both, and that for acoustics, doubtful would apply to a range of QC values from the QC procedure.

This has the benefit of flexibility when it comes to ranking the occurrences obtained by other tag location methods. GPS and pop-off and light-based locating can each have their own confidence levels and we can map the more dubious ones to 'doubtful' in the occurrenceStatus.