tdwg / dwc

Darwin Core standard for sharing of information about biological diversity.
https://dwc.tdwg.org
Creative Commons Attribution 4.0 International
201 stars 70 forks source link

Improve term indices to be more complete? #490

Open csbrown opened 11 months ago

csbrown commented 11 months ago

Description:

Some of the terms in the list of terms do not have an entry in the index. For example, the dwc:datasetName term is not listed in the index.

User Story:

I am trying to work out a mapping between a dataset with somewhat messy fields and the darwin core terms. The provided indices organized by broad group (e.g. "Occurrence"-type fields) is extremely helpful in finding the appropriate terms corresponding to a given concept. The only real way to find the dwc:datasetName term unless one already knows that it exists is to meticulously read the entire term list, which is kind of a lot. It would be nice if there were, say, a "Dataset" section in the index that listed the dwc:datasetName field. More generally, knowing that the index contains all of the terms in the list would be very useful to know that I'm not missing anything when I have a field in my dataset that I think may not really have a corresponding DWC term - viz. if the index has all the terms, then I can at least know that I have ruled out all terms on the basis of their descriptive name, if not the long-form description.

Proposed Solution:

Expand the index to include all terms in the list. Even if they get put in an "Other" category or something, just having all of the terms in the index somehow someway would be very helpful.

tucotuco commented 11 months ago

Thanks for reporting this @csbrown. There are indeed terms missing from the indices. We'll fix that.

In the meantime, does the Quick Reference Guide serve your purposes? It is meant to be the primary human reference to the terms in the standard and has the terms organized into the Darwin Core classes.

csbrown commented 11 months ago

Oh, yes the quick reference looks very handy. Thanks for the reference.

baskaufs commented 11 months ago

As @tucotuco said, you probably want to use the Quick Reference Guide for your purposes. The List of Terms is a record of complete metadata for every Darwin Core term that ever existed in the http://rs.tdwg.org/dwc/terms/ namespace, including terms that were deprecated (often those replaced by newer terms). The main reason for including the deprecated terms is so that if someone dereferences one of those old terms, they will get sent to the appropriate place in the List of Terms to find out what it means and what term it was replaced by (if any).

Placement into categories in the list of terms document is controlled by the value in the tdwgutility_organizedInClass column of this table. I did a sort by the term_deprecated column, then the tdwgutility_organizedInClass column and it appears that the terms whose value in the tdwgutility_organizedInClass column don't place them in one of the indexed categories are all deprecated. So yes, they are hard to find, but they should not be chosen for new uses anyway. Maybe what we should be doing is putting all of the deprecated terms into a pseudo-class of deprecated terms so that they would appear in the index separately.

baskaufs commented 11 months ago

Actually, upon more careful examination, the problem you are seeing with dwc:datasetName is that record-level terms aren't being handled consistently with respect to their tdwgutility_organizedInClass value. This is just a bug in the system of generating the page and needs to be fixed. Thanks for bringing it to our attention, we will fix it!