tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
205 stars 70 forks source link

Explain how to programatically access a term description #502

Open nickynicolson opened 11 months ago

nickynicolson commented 11 months ago

I have a SIMPLE_DWC format download from GBIF and I want to programmatically access the dwc term description for each column header, eg for the term basisOfRecord, I would like the definition "The specific nature of the data record." as given in the human readable page here: https://dwc.tdwg.org/list/#dwc_basisOfRecord The resources listed under getting started don't show an easy way to do this - they appear either human readable (not designed for programmatic access), include the complete term version history (I just want the latest version) or are aimed at people encoding data in dwc, not consuming it ("distribution documents").

It may be that content negotiation could give me a structured version of the terms and definitions but it doesn't appear to be covered here - could the documentation please be revised to explain this use case?

baskaufs commented 11 months ago

Hi @nickynicolson . I think what you are looking for is on the page http://rs.tdwg.org/index, which is now accessible under the TDWG website Technical menu as the "Accessing standards metadata" item. On that page, Section 3, which describes how to retrieve machine-readable metadata about pretty much every part of every TDWG standard.

If you don't want to go the content negotiation/RDF route, section 3.2 describes the primary CSV files from which all documents and machine-readable documents are derived. So, for example, the table in section 4.1 says that the metadata for literal-value Darwin Core terms are in the "terms" directory. Knowing that, Section 3.2 says that the primary metadata about Darwin Core literal-value terms is in the "terms.csv" file in the "terms" directory of the rs.tdwg.org repo. Thus you can go to the CSV table https://github.com/tdwg/rs.tdwg.org/blob/master/terms/terms.csv to get the authoritative metadata about the terms.

These tables include the metadata about the most recent versions of ALL terms, so you would want to filter out the ones that have true in the term_deprecated column.

As I noted above, all of the human-readable documents (like the List of Terms and Quick Reference Guide) and machine readable representations (RDF/XML, Turtle, and JSON-LD) are generated from this table, so they all should provide the same term definitions.

@tucotuco Can we update the documentation on the Getting Started page to point to this page?

tucotuco commented 7 months ago

Yes. I have tagged this for implementation.