tdwg / dwc-qa

Public question and answer site for discussions about Darwin Core
Apache License 2.0
49 stars 8 forks source link

Representing Absence in Event Dataset - Darwin Core Hour Input Form 1/23/2020 18:30:59 #151

Open iDigBioBot opened 4 years ago

iDigBioBot commented 4 years ago

A user submitted this information via the Darwin Core Hour webform: Timestamp: 1/23/2020 18:30:59 Please provide a topic of interest: I have a dataset (https://www.gbif.org/dataset/f56fb306-32e4-4b96-a381-6b87c186ad0f). It uses a stationary point count method for assessing reef fish (https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/fee.2144?campaign=wolearlyview). There are no absence records associated with this dataset as it's currently published. However, there is one event where no fish were seen. As it stands now this is documented as an event with no occurrences but I believe in effect this information will be lost to the data users. What would the recommendation be for how best to represent this information to an end user in GBIF? Are you capable of and interested in participating: Yes Who else would you recommend to participate in the presentation: What resources can you point to: Your name: Abby Benson Your email: albenson@usgs.gov Your GitHub username: @albenson-usgs

andersfi commented 4 years ago

Thanks for sharing,

If I understand you right, your question is mainly on how these data could be interpreted as presence/absence data from the end-users point of view.

There are still some unresolved issues in GBIF when it comes to presenting sampling event data for users (also quite a few when it comes to implementing community standards for coding up such data, but that's another story). As you probably know, according to current guidelines, suggestions are coding the absence of a species as an occurrence with occurrenceStatus=absent and individualCount=0. This would enable the end-user to straight forward flip the dataset into a wide-format type of table with events as rows and species as columns with table values indicating 0 as an absence. Events with no observations would then have a full out list of occurrences with all occurrenceStatus=absent.

If not, and this is often the case - as in your dataset, you would need to generate the list of absences event-by-event. Firstly, one would need to know the taxonomic scope of each event, currently, there is no term that can be used to code this (ideally, you would have a term such as "taxonomic scope" describing this, preferably in case of large taxon lists pointing to a check-list or similar). You would then generate a taxon list for every event in the study - usually with the total range of taxons observed in the study as a base. Currently, this would require downloading the raw DwC-A and start your data-wrangling from there. There are not really any big technical challenges with this approach from most quantitative ecologists' point of view - but it would be nice to see this also presented on the GBIF portal and API. Also, and more importantly, if the dataset is not pertinently perceived in terms of taxonomic scope for different events (e.g. undocumented or disregarded change of sampling protocol or observer throughout the survey) this approach can be quite dangerous. It would also be problematic to infer the absence of species not observed but looked for, in the study (e.g. doorstep invasives).