modeling units on properties instead of results [SOSA/SSN]

w3c / sdw-sosa-ssn

Repository of the Spatial Data on the Web Working Group for the SOSA/SSN vocabulary

7 stars 5 forks source link

modeling units on properties instead of results [SOSA/SSN] #35

Open mathib opened 3 years ago

mathib commented 3 years ago

Hi all,

This is a question concerning the linking of units to measured properties in SOSA/SSN.

The SOSA/SSN specification (section 7.3) is relatively open to which approach can be used to add units, e.g. via URIs (QUDT and OM) or by appending strings representing units inside RDF literals (CDT/UCUM). In all the examples given for each of the approaches, units are linked per observation result, e.g. example 12 with QUDT:

<observation/1087>  rdf:type sosa:Observation ;
  rdfs:label "observation w3c/sdw#1087"@en ;
  sosa:hasFeatureOfInterest  <tree/124> ;
  sosa:observedProperty  <tree/124#height> ; # note there's a typo in example 12: the slash should be a hash
  sosa:madeBySensor <rangefinder/30> ;
  sosa:hasResult [ 
    qudt-1-1:unit qudt-unit-1-1:Meter ; 
    qudt-1-1:numericalValue "15.3"^^xsd:double ] .
...
<tree/124#height>  rdf:type    sosa:ObservableProperty , ssn:Property ;
  rdfs:label "the height of tree w3c/sdw#124"@en ;
  ssn:isPropertyOf <tree/124> .

When dealing with large datasets (e.g. 1000 sensor measurements from one sensor, always measuring something with a fixed unit), this might result in some kind of redundancy and I wondered if it's also allowed to relate units information directly to a specific sosa:ObservableProperty / ssn:Property instance or subclass. In this case, it's probably only possible using QUDT or OM (there's no RDF literal with a measured value already known on the conceptual level, so CDT/UCUM is not possible). When applied to the same example 12, this would look like this:

<observation/1087>  rdf:type sosa:Observation ;
  rdfs:label "observation w3c/sdw#1087"@en ;
  sosa:hasFeatureOfInterest  <tree/124> ;
  sosa:observedProperty  <tree/124#heightInMeter> ;
  sosa:madeBySensor <rangefinder/30> ;
  sosa:hasResult [ qudt-1-1:numericalValue "15.3"^^xsd:double ] . # or even shorter: sosa:hasSimpleResult "15.3"^^xsd:double
...
<tree/124#heightInMeter>  rdf:type    sosa:ObservableProperty , ssn:Property ;
  rdfs:label "the height of tree w3c/sdw#124 in meter"@en ;
  qudt-1-1:unit qudt-unit-1-1:Meter ;
  ssn:isPropertyOf <tree/124> .

From the side of QUDT 1.1, this seems OK I think as there is no rdfs:domain defined

dr-shorthair commented 3 years ago

@mathib you can define the observable property any way you like, as long as your community understands what you are doing. Your use-case is common, and it is almost ubiquitous to see units bound to properties in catalogues of observable-properties in just the way you propose.

If you want to use predicates and values from QUDT, however, I would strongly recommend using them from the current (unversioned) namespace - i.e.

schema elements from http://qudt.org/schema/qudt/ such as this http://qudt.org/schema/qudt/unit
individuals from http://qudt.org/vocab/unit/ such as http://qudt.org/vocab/unit/M These all dereference properly.

QUDT is in version 2.1 - version 2 is several years old now - see http://www.qudt.org/

rob-metalinkage commented 3 years ago

re "directly to a specific sosa:ObservableProperty / ssn:Property instance or subclass" and "you can define the observable property any way you like"

there is a bit of a disconnect here with O&M where to my mind it is ambiguous where a "ObservableProperty" more naturally relates to an instance of type rdf:Property - thats "a property" in most RDF contexts - and naturally supports statements about UoM and other "default values", and all the other types of useful constraints - such as it is necessary to observe both temperature and pressure at the same time.

the sosa examples are limited to the per-feature case which are basically reified statements of relationships between values and feature instances, and doesnt support much information about the property actually being observed. It also leads to a lot of redundacy and verbosity - such as the label ""the height of tree w3c/sdw#124 in meter"@en" - its hard to imagine any real world system that would choose to create such labels and lose information about the actual nature of the property rather than modelling the property and attaching a value for it to a feature (an instance of the property)

i.e. assigning UoM constraints to individual feature instance "properties" is sub-optimal half-way house in terms of both expressivity and verbosity.

The most obvious pattern would be to define constraints on the modelled Properties of Feature classes which could be used to entail property values of instances

e.g. :Cat a owl:Class. :fluffyness a owl:DatatypeProperty ; rdfs:Domain :Cat .

:flufftest a sosa:Observation ; sosa:featureOfInterest :mog ; sosa:observedProperty :fluffiness ; sosa:hasSimpleResult 3 .

entails... :mog a 🐱 :fluffyness 3 .

and by defining a constraint on the property :fluffyness that its UoM is the :fluffIndexUnit

entails: :mog_fluffiness a rdf:Statement
rdf:subject :mog ; rdf:property :fluffiness ; qudt-1-1:unit :fluffIndexUnit .

IMHO SSN needs work to make the choice of options and implementation patterns clear - or if the intention is truly that there is no choice in this matter then another mechanism needs to be added to allow properties to be modelled in a systematic way.

KathiSchleidt commented 3 years ago

One bit that's been long bothering me in SOSA is that, at least in the examples, the ObservableProperty is somehow munched up with the FoI instance it pertains to. While I get the fact that I can then traverse the associations to discover that sosa:observedProperty <tree/124#height> has something to do with the ssn:Property height, I've never found reference to this more generic height property within the examples. On a practical level, I see more need for this generic height property to allow for comparison of the heights of tree/124 vs tree/125

The way @rob-metalinkage has done the cat fluffyness, it makes a great deal more sense to me, whereby I'd start with a generic fluffyness property, then specialize this to cat_fluffyness (other beasties also be fluffy!), thus allowing me to apply it to my Mitzi and Rob's Mog and do a comparative fluffyness study.

The approach of linking the property with the class of the FoI is also more in line with the approach being taken by the I-ADOPT Group in RDA, where as far as I've understood, the ContextObject indicates the class the FoI is an instance of.

Now getting to the heart of this thread, where to put the UoM. I tried something related during our rework of O&M to V3, proposed making it possible to indicate the UoM within the Observation itself (especially for dealing with time series. In O&M, we don't need to create an Observation for each value of a timeseries, one Observation can serve as the head for the entire series as long as the same observational metainformation pertains to all values. Would also ease alignment with the SensorThings API that has pulled the UoM up to the Datastream), but was voted down as the rest of the group preferred to keep the UoM closer to the result.

Conceptually, to my view, it should be possible to shift the UoM up and down the chain from Result to Observation to ObsProp as required (these are very conceptual models we're dealing with here!)

Pragmatically, I'd say it depends on the use case, e.g. is there potential for including Observations with nonstandard UoM, and would these be of benefit. Getting data in the wrong units is an eternal problem, but closing the door on potentially valuable data only due to the units its provided in would be a waste of good data (assuming the units can be converted)

And yes, what this entire conundrum really needs are good stable tested examples!!!

dr-shorthair commented 3 years ago

Maybe we should just bite the bullet and acknowledge the fact that, even if it offends our understanding of the notion of 'quantity' which binds the number and scale together, in pragmatic reality it is common to record the uom at the observation or more often collection or observable property level. We can probably add a property for this (or just use qudt:unit) and add some rules about where it sits and how the value propagates down and maybe even some property-path rules.

smrgeoinfo commented 3 years ago

The pattern @dr-shorthair suggests could work like schema.org variableMeasured description of a Dataset, in which the Dataset is an observation collection, and each variableMeasure/PropertyValue specifies a propertyID, units, measurementTechnique for one of the result types (a variableMeasured) in that collection.

rob-metalinkage commented 3 years ago

The underlying issue is still the semantics of observedProperty - this proposed pattern works, and entailment rules /property chain axioms are possible, only if the semantics is nailed down to explicitly allow for descriptions of properties, not only for reified instances of property values on feature instances. Currently I believe SOSA is, at best, confusing and incomplete.

kjano commented 3 years ago

Hi,

I think I may be misunderstanding were the problem is.

An observation is the act of carrying out an observation procedure in order to estimate or calculate a value of an observable property of a feature of interest or a sample thereof.

Where does the semantic confusion arise in your opinion?

If a particular storm is the feature of interest and I would like to make observations of said storm, I could, for instance, sample some body of air to study the observed properties "barometric pressure" or "wind speed". These properties must be either properties of the FOI, of samples of the FOI, or proxies such that I can draw conclusions about the FOI or ultimate FOI by quantifying them.

This leads to the question of how many such observed properties exist. In my view their number is small and finite. There is a "barometric pressure" and "wind speed" and so on and they should come from a scientific vocabulary or code lists. Others have argued for observation/FOI-specific relations such as "barometric pressure storm1", "barometric pressure storm2", and so on.

These two views are not radically different and one can map between them.

Does this help or am I trying to answer the wrong question?

Jano

On 6/27/21 4:10 PM, Rob Atkinson wrote:

The underlying issue is still the semantics of observedProperty - this proposed pattern works, and entailment rules /property chain axioms are possible, only if the semantics is nailed down to explicitly allow for descriptions of properties, not only for reified instances of property values on feature instances. Currently I believe SOSA is, at best, confusing and incomplete.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/sdw-sosa-ssn/issues/35, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANMP5QLDMH7BO3WX67B7FDTU6VV7ANCNFSM46MLHFCA.

-- Krzysztof Janowicz Professor for Geoinformatics, Director Center for Spatial Studies Geography Department, University of California, Santa Barbara 4830 Ellison Hall, Santa Barbara, CA 93106-4060

Email: @.*** Webpage: http://geog.ucsb.edu/~jano/ Semantic Web Journal: http://www.semantic-web-journal.net

rob-metalinkage commented 3 years ago

"These two views are not radically different and one can map between them." is the point - the SOSA spec does not currently make it clear that either of these two views (or even more importantly the pattern as presumed by Simon of modelling a property of a Feature Type Class) are allowed or may be mapped. I would like to propose either an update or a guidance note about this.

dr-shorthair commented 3 years ago

I fear more than one issue is being conflated here. I'd like to focus on the one that started the thread.

My understanding of the motivation for this thread is the observation that, for many or most datasets, stored in the traditional forms of databases and spreadsheets, the unit-of-measure is included in the 'column heading' for a list of values, thus apparently binding the unit-of-measure to the observed-property. OTOH, in O&M and SOSA the unit-of-measure has more often been bound to the quantity-value - i.e. the 'cell' rather than the 'column'. While this is conceptually OK (1" is the same value as 25.4mm) it looks inefficient and does not seem to match normal practice.

In a concrete example: a 'direct serialization' using SOSA seems to require

# Example 1
my:Observation99 a sosa:Observation ;
    sosa:observedProperty <PbConcentration> ;
    sosa:hasSimpleResult "3.2 [ppm]"^^cdt:ucum .

while a more direct representation of a "cell from a typical spreadsheet" would be more like

# Example 2
my:Observation99 a sosa:Observation ;
    sosa:observedProperty <PbConcentrationInPPM> ;
    sosa:hasSimpleResult "3.2"^^xsd:decimal .

where PbConcentrationInPPM essentially implements the heading from the 'lead' column (details of the ObservableProperty formulation not shown).

So the proposal is to 'allow' the unit-of-measure to be found in other places, at least in the serialization. For example in the description of an observed-property as shown in Example 2. Or maybe just associated directly with an observation like this

# Example 3
my:Observation99 a sosa:Observation ;
    sosa:observedProperty <PbConcentration> ;
    sosa:hasSimpleResult "3.2"^^xsd:decimal ; 
    qudt:unit <http://qudt.org/vocab/unit/PPM> .

Both Example 2 and 3 seem to offend some rather extreme/purist interpretation of the SOSA model.

However, it is pretty easy to write rules or axioms to move the key information from wherever you find it into your preferred slot. And while a conceptual purist might insist that the unit-of-measure is only associated with the value of the result, the pragmatist would recognise that there may be temporary efficiencies gained by putting it in another place, and also it would be less surprising to typical scientists and data managers.

My suggestion is to allow all of these forms, but to push back onto the domains and communities the responsibility to document their preferred pattern or practice (and maybe provide code to transform between them).

Of course the issue only arises in the first place if you think of the RDF representation as some kind of persistent artefact, rather than just something that is built on-the-fly from whatever datastore you are accessing. But we've all been trapped there sometimes ;-)

KathiSchleidt commented 3 years ago

I get the feeling that at least part of this dilemma is us protecting our users from themselves. To my view, the classic case is avoiding the conflagration of PbConcentration in PPM as in the examples above with other measurement expressed in µg/m³. At the same time, most users looking for PbConcentration in PPM will also be happy with measurements expressed in µg/m³, will do the necessary transformation - if the UoM gets mixed into the ObsProp one would need to search for all possible combinations.

I think it also depends on if your Observations only pertain to one single result as common in SOSA (though could also envision a time-value array as a result here - has anybody tried this?), or could also pertain to a timeseries as common in O&M. In most systems I've encountered, when time series are provided, the values tend to have a consistent UoM; to my view this indicates that the UoM can be stored with the Observation, does not need to be repeated for each result value.

My current impression is that its almost more an encoding issue, various attempts to avoid redundancy :

If I'm providing very atomic SOSA observations, shifting the UoM to the ObsProp saves me having to repeat this each time (while creating other issues, e.g. the multiple ObsProps by UoM)
If I'm providing time series, shifting the UoM to the Observation saves me having to repeat this each time
If I want to assure that one result value taken out of context can still be correctly interpreted, I probably have to pack everything on the result value (but that can also then get pretty silly, where do I stop, I need more than the UoM... ;) )

rduerr commented 2 years ago

And let's add remote sensing imaging instruments in here. Like MODIS! It takes an image roughly every 5 minutes (OK, the raw data comes down as bits but the only way to get those bits after the fact is in 5 minute chunks). The images are huge arrays and after processing there are about 200 or so different kinds of those images - all sorts of parameters from snow coverage, to atmospheric parameters, to who knows what! Moreover, most of those images contain more than 1 set of parameters (my favorite snow product has, if I remember correctly, 26 data layers). Associating each layer associated with each pixel to a unit is ridiculous! Associating a unit to the parameter for each layer in an image is also ridiculous given the millions of images taken over the decades MODIS has been operating. Associating a unit to the parameter for each layer in the whole set of images, now that is achievable!

dr-shorthair commented 2 years ago

SOSA/SSN does not have to be implemented as a static encoding. It can be thought of as a conceptual model. It explains (and gives names to) some entities and relationships in data, and some expectations or constraints on those. For some datasets and collections it would make a lot more sense to be implemented virtually - 'on-demand' if you like - rather than as a storage model. I would never imagine creating an om:Observation for each pixel, for example (even though each value could be thought of as the result of a single observation and a single pixel value might be delivered as the result of a single sosa:Observation).