Explain derived results and processing chains

jennet commented 4 years ago

I have a modelling question that I hope you can help me with, I have a set of citizen science data where first a person takes samples at a particular datetime, and then these are analysed in the lab about a week later. Would the results of that analysis still be modelled as observations and results of a particular Sample (that relates in turn to a FeatureOfInterest), or should they be modelled in some other way?

I'd like them to still be modelled as observations of observable properties of the featureofinterest, but want to check that is the way in which the model was intended to be used.

Would the lab analysis be a Sensor?

Is there a preferred way to differentiate between original samples (e.g. observed counts of an organism) taken by people on the day of a survey, and the results of further analysis? Perhaps this would simply be different ObservedProperties by different Sensors/Procedures (e.g. Sensor1 would be a person during citizen science collection survey looking at ObservedProperty1 (using Procedure1), and Sensor2 would be a lab analysis, using Procedure2 and looking at ObservedProperty2).

Hope this all makes sense and thanks in advance!

dr-shorthair commented 4 years ago

Yes, and this is why (a) there are different time-stamps for phenomenon-time and result-time (b) the sample and its relationship with the ultimate feature of interest, and the act of sampling and sampling procedure is modelled explicitly in SSN.

Also see some work that I've been doing on some small extensions to SSN covering some related concerns - https://raw.githack.com/w3c/sdw/simon-ssn-ext/proposals/ssn-extensions/index.html Note that this document is very drafty and there is a line above S4.3.4 below which it has not been updated to match the recent vocabulary changes.

rob-metalinkage commented 4 years ago

There is another aspect that seems particularly challenging for the heterogeneity of the Citizen Science domain, but is prevalent in any system-of-systems view too is that the level of abstraction of the ObservableProperty is an implementation choice. At the OGC the Decision Support WG looked at registering sensor models and parameters in OGC Definitions Server - and we had to push back because there was significant inconsistency in approach here that needed further untangling.

The main options seem to be (crudely stated) 1) the underlying type of measured quantity for the physical phenomenon - e.g. Temperature 2) The intended meaning of the measurement (e.g. sea surface temperature) 3) The sampling protocol - exact specification of what is being measured 4) the name of the attribute of the feature of interest (which may be dereferenceable to find its semantics)

There are probably other options.

Each of them is a choice that pushes description responsibility in different ways to the way we handle feature (class) description, observedProperty, usedProcedure , madeBySensor and other parameterisations. SSN is currently agnostic about this choice (AFIACT - and I tracked the development and had some input into its modularity in particular, so I may have missed some perspective here - but I seem to be in good company ),

The need to narrow down such implementation choices, and explain what you have done is a driving force for some work being done on an ontology to describe profiles of standards (https://www.w3.org/TR/dx-prof/). (you wont find much about SSN here - as this would have taken these debates too deeply into details when we are looking at a very general need)

For example, we could define a profile of SSN where the observedProperty is a named attribute of the FeatureType of the FeatureOfInterest,

we could then create a profile of that profile where the attribute is dereferencable to a canonical description model of how the attribute relates to the underlying phenomenon.

This would appear to be the bare minimum to support interoperability to a degree sufficiently to identify the underlying phenomenon being measured for this implementation choice (i.e. the competency question for an ontology based on SSN)

I'd be interested in the takes on this from other experts in the group, and it would be useful if @jennet and others could contribute examples to drive Use Cases for this.

@dr-shorthair - is this matter appropriate for the SSN extensions (and if so I'll put my hand up to contribute to it) - or is a separate activity required? (I suspect the latter as its not really extensions so much as implementation patterns)

jennet commented 3 years ago

@dr-shorthair I have a "similar but different" question that I thought I'd just append to this issue rather than creating a separate one, apologies if that's not the best way to log this query. However it's still related to derived results done in a lab

What would be your suggestion in regards to modelling when it comes to derived results that give a result to an observable property of a different feature of interest?

My example would be:

an observation consisting of a number of different samples that show abundance (ObservedProperty) of biological taxa (FeatureOfInterest) found at a site (Platform) on a water body
an analysis done at a later date that uses those sample results to calculate a score for water quality of the water body that the samples were taken at

Data consumers are interested in the raw observation counts relating to taxa, but also interested in the generated scores relating to the water bodies in which the samples were taken. Given that SOSA/SSN is primarily focused around raw observations - would it be an incorrect application of SOSA to also try and model the secondary calculations that change the focus of the feature of interest?

dr-shorthair commented 3 years ago

@jennet that is all good. That scenario was one of the motivating factors for making the observed-property and feature-of-interest explicit. The feature-of-interest associated with an observation should always be the 'proximate' FOI, for which the OP of the observation is a characteristic. In some of the earlier OGC documents we described explicit scenarios where the OP and FOI evolved through a data processing chain, e.g. in remote sensing where the initial FOI is a solid angle based on the sensor, and the OP is light amplitude at various wavelengths, while the final FOI is a tract of the landscape and the OP is vegetation cover or similar. Intermediate stages have 'scene' as the FOI, and NDVI as the OP ...

jennet commented 3 years ago

Brilliant, thanks!

KathiSchleidt commented 1 year ago

@jennet : As it looks like all questions in this thread have been answered, can we close?

In addition, as the question of derived results/processing chains keeps coming up, do we need some sort of position paper on useful approaches? Examples of different viable approaches?

dr-shorthair commented 10 months ago

We can keep this issue open as a prompt to improve the documentation of derived results, processing chains etc.

dr-shorthair commented 1 month ago

Add notes about processing chains and derived results - e.g. in

or just as another topic in

https://w3c.github.io/sdw-sosa-ssn/ssn/#common-modeling-questions

w3c / sdw-sosa-ssn

Explain derived results and processing chains #21