tdwg / dwc-qa

Public question and answer site for discussions about Darwin Core
Apache License 2.0
49 stars 8 forks source link

Use of International System of Units (SI) in sampleSizeUnit and what is the best practice #208

Open EstebanMH-SiB opened 2 months ago

EstebanMH-SiB commented 2 months ago

We are currently updating our documentation in spanish and found the use of the SI for sampleSizeValue and sampleSizeUnit puzzling. In the examples is documented as 'metre' but in the comments as 'm':

image

We have two main questions regarding this, first that we are not sure if the difference is intended to showcase all possibilities or if it is a mistake and they should be equal. If there is a mistake, is it worth opening an issue to change the example/comment in the documentation or if it is not necessary?

And second, what should be the best practice for units in those elements, if we should encourage publishers to use SI as much as possible or if using literal values is a valid approach.

Thank you very much for any feedback and have a good day,

PS: We do not use the IRI equivalents because they are harder for users so they are not included in our documentation in spanish.

tucotuco commented 2 months ago

@EstebanMH-SiB Thank you for the careful review. Unlike dwc:measurementUnit, dwc:sampleSizeUnit does not make the recommendation to use an SI unit. In that respect there is nothing wrong with the examples you have mentioned, though they are inconsistent with each other. In fact it doesn't even recommend that the values should come from a controlled vocabulary, though I think that would be a good idea. The iri version of the term (dwciri:sampleSizeUnit) does make the SI recommendation. The unit terms in Humboldt extension do recommend a controlled vocabulary, but do not mention the SI system.

I suspect that not all sample size measure lend themselves to SI units. If this is the case, and should be easy to prove if anyone has an example, then a SI recommendation would have to say something like "whenever feasible". My intuition though is that the formulation of the Humboldt Extension unit terms is the best one,

"Recommended best practice is to use a controlled vocabulary. For units containing exponents, use characters from the Unicode Latin-1 Supplement character set (hex 00B2 for squared and 00B3 for cubed). This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value."

and that an issue should be created for dwc:sampleSizeUnit](https://dwc.tdwg.org/terms/#dwc:sampleSizeUnit) to be made consistent with them.

A controlled vocabulary might also consist of SI preferred values and spelled out alternate values that could be unambiguously understood, and in multiple languages.