Closed aclum closed 1 year ago
There is guidance from the GSC on this per the mixs v6 excel doc Units - Except a few cases, strict units are not defined for items in the MIGS/MIMS/MIENS checklists, wherever applicable the unit of choice should accompany the value of an item. The units should be in accordance with the The International System of Units (SI).
I am really opposed to taking a 'please see' approach to this, but your research is a good starting point, @aclum.
There are lots of historical solutions for this situation, as well as some LinkML specific solutions:
unit
slot links a SlotDefiniton
to a single UnitOfMeasure
, which has a variety of slots to reference internal and external definitions.I will be pushing fro GSC to use the last solution, so I would be vary happy to see us commit to it.
based on @mslarae13 there are some slots where we aren't able to define a single unit of measure. If memory servers some of the same slots would have different unit of measurement if it were solid vs liquid for example. Montana, is that correct?
Let's separate these two issues:
For 1, ideally we would just refer to MIxS. However, the MIxS guidance is unclear and underspecified. For example, as per the guidance that @aclum quotes, the units should be "in accordance" with SI. But what does this mean?
I think most people would understand that using SI would mean using symbols, e.g. m
. However, MIxS implicitly favors spelled out names not symbols, like meter
. Using names rather than symbols is a bad idea due to the different forms. Formally, SI uses metre
as the name, but the NIST page uses meter
since that is US-preferred. None of this would be a problem if symbols were used rather than names, but for reasons MIxS uses the names.
Regardless of names vs symbols, there are many ambiguities with derived units.
/
? or is µg per m³ allowed? or microgram per cubic meter?MIxS is also very ambiguous when it comes to pluralization. We would hope that singular forms are mandated to avoid further confusion, but this isn't the case
We can see examples like:
There is also no guidance on how to do non-number of cells per gram
There is a standard that solves all of these issues, UCUM https://ucum.org/. UCUM is the standard used in all health related data models and standards that I am aware of. UCUM provides a completely unambiguous system, and as far as I am aware every unit that could possibly be required in MIxS could be represented in UCUM. It's very easy to use
For example, micrograms per cubic meter is ug/m3
.
UCUM provided standard validators, and a completely computable system.
There is a proposal for MIxS to adopt UCUM here:
My preference is that NMDC mandates UCUM, and we lobby for this to follow suit in MIxS. Strictly speaking we will be using different units than many of the "preferred units" in MIxS, but this is no more inconsistent than anything else
This sounds good to me, as long as what we are doing is clear others groups can interoperate as needed.
@cmungall will be at the metadata meeting on Wednesday?
see also
which has the following slot:
I believe we agreed at the metadata meeting last week to implement UCUM so I will close this ticket and open a new one for implementation.
Creating a ticket for this per the metadata meeting last week
Currently the range for has_unit
We want this to have some constraints or controlled vocabulary that is programmatically enforced to consistently populate this.
For example the has_units on slot 'depth' currently has the following values
meter
,metre
,meters
,null
The UM ontology was suggested although this may not be expansive enough. @turbomam mentioned some built in options from link-ml@mslarae13 @cmungall @turbomam