information-artifact-ontology / IAO

information artifact ontology
Creative Commons Attribution 4.0 International
70 stars 25 forks source link

how to annotate or express that an entity has a preferred measurement unit (label) #194

Closed Public-Health-Bioinformatics closed 7 years ago

Public-Health-Bioinformatics commented 7 years ago

We can say that a particular datum "has measurement unit label" some [Unit Ontology id].

But in the interest of normalizing data across data repositories, it would be great to express that a type of entity has a recommended unit. That way data repositories that buy into a particular application ontology's proscription could know which unit to eventually migrate their data store values to. So something like "has preferred measurement unit label"? E.g. "air temperature" "has preferred measurement unit label" some "celsius". Is this attractive?

alanruttenberg commented 7 years ago

One way would be to subclass measurement unit with an axiom

'project x length measurement datum' subclass 'length measurement datum'

'project x length measurement datum' 'has measurement unit label' only {'cm'}

The problem with "preferred" is that different projects have different preferences. One way to normalize would be to supplement the ontology with some dl-safe rules that convert to a common unit on a designated property (project-specific).

Public-Health-Bioinformatics commented 7 years ago

Hmm. Indeed each project would have its own needs at the field level, so by your example, "project x field y length measurement datum" . I guess a pure approach would as you say detail each datum as it is stored by a given standard or in a given repository. I had considered just making favoured unit an annotation type, but the problem also extends to formats, e.g. a mmm-dd-yyyy date field vs ISO standard yyyy-mm-dd used internally, and other properties of datum values like precision and error range. The pattern invites treating them all as properties rather than annotations.

For the record, task is all about describing non-ontologized 3rd party fields, e.g. NCBI https://www.ncbi.nlm.nih.gov/biosample/docs/attributes/ "host body temperature" which allows both celsius and fahrenheit at point of data entry, but one wouldn't want a database mixing the two.
I'll try a few experiments, and close this for now.