incf-nidash / nidm-specs

Neuroimaging Data Model (NIDM): describing neuroimaging data and provenance
nidm.nidash.org
Other
33 stars 30 forks source link

Modeling Units of Values in NIDM-E #482

Open khelm opened 5 years ago

khelm commented 5 years ago

This issue is to track the discussion on how to represent units in the data model. Background info is here: General Discussion OM - Ontology of units of Measure UO - Units of Measurement Ontology (used with PATO)

In simplest terms, the question is whether to define datatype properties such that the units are embedded in the term (e.g., nidm:FOV_in_mm, essentially the standard CDE-type practice) or create qualified properties that include units in the qualification. Here is an example for an item called "item10245" and you want to record that it has a weight of 2.4kg. Then:


exproduct:item10245 exterms:weight [ rdf:value "2.4"^^xsd:decimal ;
                                     exterms:units exunits:kilograms ] .

which, when written out fully is:

exproduct:item10245   exterms:weight   _:weight10245 .
_:weight10245         rdf:value        "2.4"^^xsd:decimal .
_:weight10245         exterms:units    exunits:kilograms .

Note that here exterms:weight is a datatype property (and would often be written as exterms:hasWeight

satra commented 5 years ago

an alternative:

:item10245 a :VolumeMeasurement ; 
       rdf:value 2.4 ; 
       hasUnits :mm3 .
satra commented 5 years ago

but more generally i think we should have label properties that have units and then use these properties with just values. as examples:

https://github.com/incf-nidash/notebooks/blob/master/ttl_examples/fsterms.ttl#L23

https://github.com/incf-nidash/notebooks/blob/master/ttl_examples/fs_stats.ttl#L264

also for units, in bids, we are using (which is all ascii): https://people.csail.mit.edu/jaffer/MIXF/MIXF-10

@tgbugs knows of other efforts (from NIST) but their state seems to be a bit in flux.

tgbugs commented 5 years ago

I did another check in on the state of things in Match as a follow up to our September 2018 exchange on the subject. Nothing much has changed as far as I can tell.

My units parser is packaged, though I haven't fully managed to decouple it from the rest of my codebase yet. @dbkeator I am also working on a more useful python representation of the output format since at the moment it is just nested tuples.

When I talked to Robert Stevens back in January I asked him about reasonable ways to represent units. His suggestion was more or less in line with both of the proposed formats. The primary issue though is that using iris for units has terrible composability and will prevent people from using anything but units that already have a url (getting new ids is nearly always a stumbling block in these cases). The approach that I use is to provide the units as a string with a specific datatype so that a parser that knows how to interpret that representation can convert into whatever internal representation it needs.

"""(param:unit-expr (/ (param:unit 'grams 'milli)
                       (/ (param:unit 'grams 'kilo)
                          (param:unit 'hours))))"""^^TEMP:protc:unit

Basically I'd rather write a parser than try to mint uris for all possible unit expressions, but the scope of the problem I'm trying to solve may be larger. If you have a well understood and bounded set of units that you know will be used (and rarely extended, and definitely not extended by users) then the uri approach could work.

With regard to the nidm:FOV_in_mm example, it seems as if precomposition might not be needed, since you could use nidm:FOV and then mm in whatever format you decide. This avoids having to check that the units match the predicate type (nidm:FOV_in_spatial1d might be a way around some of those issues, but you will still have to check that the unit and dimension match). On the other hand if there is specific information (e.g. documentation) about that composition then it might be useful. Either way I suggest that if you precompose the predicates then the dimension rather than the unit be used.

tgbugs commented 5 years ago

Update on this. I have most of native python representation for quantities working. It needs way more tests (just playing around for this comment has revealed a number of issues) but most of the basic operations on quantities with units are implemented as is direct export of quantities to an rdf representation. Implementing a units.UnitsParser('9-14 weeks').asPython().mixf should be possible from here (with a note that parsing of mixf is not implemented yet). Will continue to provide updates.

I've split it into two files at the moment but I think I can safely merge them all back into the pyr file and still allow rdflib to be an optional dependency for those who don't need that functionality.

  1. triples conversion and some unit math
  2. core python representation

An example. print(units.UnitsParser('9-14 weeks').asPython().ttl.decode())

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix unit: <http://uri.interlex.org/tgbugs/uris/readable/aspect/unit/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

### Axioms

[] a rdfs:Datatype ;
    owl:onDatatype unit:weeks ;
    owl:withRestrictions (
            [ xsd:maxInclusive 14 ]
            [ xsd:minInclusive 9 ] ) .

### Serialized using the pyontutils deterministic serializer v1.2.0
satra commented 5 years ago

@tgbugs - very nice! will play with this as we build our tools.

tgbugs commented 5 years ago

Thanks! In the mean time I did just enough implementation to understand that I didn't want to write a complete units system myself. The good news is that pint uses essentially the same approach that I was embarking on (and does a much better job of it). So the core python representation for the units is now pint plus a few helpers for things like ranges. My parser still goes in front since it has better coverage of the diversity of expressions encountered in the literature than the pint parser. I have subclassed pint's quantities and units to support export to rdf, json, and SI/NIST text formatting. Currently the python representation can be populated from my parser's s-expression ir or from a json representation. Populating from an rdf representation isn't implemented yet but I imagine the only hurdle will be figuring out the convention to use to mark subgraphs for conversion (e.g. owl:onDatatype being subClassOf some unit).

One question about the representation of the units themselves. I am currently normalizing all quantities to be unprefixed (e.g. kg -> g, ms -> s) to normalize the graph representation to simplify engineering of the downstream search. There are a number of detractors to this approach around significant figures and loss of precision (among others), especially for the use case under discussion in this issue as well as for my protocols use case. The tradeoff seems to be that we would have to mint iris for every prefixed unit, or use a more complex graph representation for units. Do you think a more complex graphical representation is a viable solution to avoid having to mint tons of iris?

satra commented 5 years ago

@dbkeator, @adswa - see @tgbugs example above for something that we could use in the nidm data elements.