tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
201 stars 70 forks source link

Recommendations on missing/unknown/not recorded data in Darwin Core #437

Open ymgan opened 1 year ago

ymgan commented 1 year ago

This issue is inspired by Robert Mesibov's post in GBIF discourse - The vexed question of missing data in Darwin Core. The discussions on the thread and Arctos are very insightful. (Thank you!)

In the post, Bob mentioned:

The Darwin Core recommendations don’t provide a lot of guidance. The entry “unknown” is recommended when footprintSRS, geodeticDatum, verticalDatum or verbatimSRS isn’t known. On the other hand, the recommendation for coordinateUncertaintyInMeters is Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates).

Take the term geodeticDatum for example. unknown and not recorded are recommended in different sources.

From Darwin Core Quick Reference Guide

Recommended best practice is to use the EPSG code of the SRS, if known. Otherwise use a controlled vocabulary for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary for the name or code of the ellipsoid, if known. If none of these is known, use the value unknown.

From Georeferencing Best Practices

It is thus recommended to record the EPSG code of the coordinate reference system if possible, otherwise, record the EPSG code of the datum if possible, otherwise, record the EPSG code of the ellipsoid. If none of these can be determined from the coordinate source, record "not recorded"

Subsequently these recommendations affect downstream implementation such as:

Hence I would appreciate if there will be a general guidelines on how to treat different scenario of NITS (Nothing Interesting To Say) in Darwin Core. I appreciate Bob's suggestion on how to treat missing data in his post:

Here’s a possible answer to the “What to do with missing data?” question, and it’s one I regularly propose to the compilers whose Darwin Core data tables I audit: If a data item is missing, leave it blank. If you have a reason for the "missingness’, put it in a …Remarks field.

Thanks a lot!

qgroom commented 1 year ago

In the context of transcribing labels from specimens we also made a recommendation to break down unknown into...

Quentin Groom, Mathias Dillen, Helen Hardy, Sarah Phillips, Luc Willemse, Zhengzhe Wu, Improved standardization of transcribed digital specimen data, Database, Volume 2019, 2019, baz129, https://doi.org/10.1093/database/baz129

Mesibov commented 1 year ago

Here's a good summary from Data Carpentry about missing values as blanks:

https://datacarpentry.org/spreadsheet-ecology-lesson/02-common-mistakes/#null