tdwg / esp

Earth Sciences and Paleobiology Interest Group
13 stars 10 forks source link

What is the best practice for typing a fossil specimen? #3

Open dennereed opened 7 years ago

dennereed commented 7 years ago

For the fist use case, a fossil mandible fragment, how do we use DwC to type the specimen, i.e. what are the appropriate values for "dcterms:type" and "basisOfRecord". The solution John presented at TDWG 2016 was dcterms:type = PhysicalObject (from the DCMI type vocabulary) and basisOf Record = FossilSpecimen. We need to draft a short paragraph explaining the rationale for these values. Also, documenting whether the values should be string literals such as "PhysicalObject" or URI's such as "http://purl.org/dc/dcmitype/PhysicalObject"

debpaul commented 7 years ago

Hey @dennereed if you haven't already done this, and still seek feedback, please post your question about

documenting whether the values should be string literals such as "PhysicalObject" or URI's such as "http://purl.org/dc/dcmitype/PhysicalObject"

to the dwc hour input form https://tinyurl.com/zja2muz

dennereed commented 7 years ago

Deb. Sorry for the delay. I just posted this issue to the tdwg-qa issue tracker (#58). Hope to get a response from John W. or Steve B.

baskaufs commented 7 years ago

This is a good question! I can give you an answer with respect to RDF, but I think the non-RDF answer is going to depend on whatever convention is established by the community, and John Wieczorek would be able to provide a better answer than I.

In RDF, the recommendation for typing things is to always use the well-known term rdf:type (http://www.w3.org/1999/02/22-rdf-syntax-ns#type) with a URI value. This is a fundamental property RDF for describing what kind of thing something is. There is no prohibition against providing multiple values for the term, so you could say

ex:thing rdf:type dwc:FossilSpecimen;
         rdf:type dctype:PhysicalObject.

In RDF, Dublin Core recommends against using dcterms:type for the very reason that rdf:type is more well-known.[1]

The Darwin Core RDF guide considers use of dwc:basisOfRecord optional (Section 2.3.1.4). If used in RDF, it should have a literal (string) value.

In my view, the basic problem that we have is that spreadsheets and tables are by their nature "flat". Although we like to think of a row in a table (a "record") represents one kind of thing, a row often contains metadata about several kinds of things, e.g. a specimen, a collector, a taxon, etc. It then becomes difficult to use a single column to describe everything covered in the row. In RDF, we get around this by breaking up the metadata into chunks and provide an rdf:type value for each chunk. That provides more clarity, but comes at the cost of increased complexity. When any individual user is presented with this dilemma, their response is usually "RDF is more complicated than what we need at the moment." and they move on. Hence, the lack of traction for RDF.

This is a longer answer than what you wanted, I'm sure. To come back to your specific question, I think the answer depends on how the data you are marking up is going to be used, and by whom. In RDF, we assume that we don't know who will be using data nor do we know what the use will be. You really can't take that approach with tables and spreadsheets - there needs to be some pre-existing understanding between the provider and the consumer about what ambiguous columns in a row "mean". I think that there is a general consensus in our community that dwc:basisOfRecord "means" the form of evidence that documents an occurrence record (a usual type of a row in a table sent to GBIF), and that it should have a string literal value. I'm not sure that there is a consensus about dcterms:type because it's not clear to me what people use that information for. Technically, dcterms:type (http://purl.org/dc/terms/type) should have a URI value and dc:type (http://purl.org/dc/elements/1.1/type) can have a literal value. But historically,TDWG has not really paid any attention to this distinction. I suspect you would find that people use dcterms:type very inconsistently. I would try to find out (from John W.) about how dcterms:type is most commonly used and do the same. Otherwise, for dcterms:type just recommend either string or URI values and try to get your community to be consistent about it.

Steve

[1] http://dublincore.org/documents/dc-rdf/#sect-5 [2] http://rs.tdwg.org/dwc/terms/guides/rdf/index.htm#2.3_Predicates

debpaul commented 7 years ago

also @dennereed see http://wiki.dublincore.org/index.php/FAQ/DC_and_DCTERMS_Namespaces for some background on where the current situation comes from. Standards evolve :-)

dennereed commented 6 years ago

Created two wiki pages to help address this issue, one document the Darwin Core paleo use of basisOfRecrod and another FAQ page on the topic to typing fossil specimens. The former is more targeted and focuses on use of Darwin Core paleo terms whereas the latter is more general and address general issues of typing and which terms are appropriate and their implementation in RDF.