ternaustralia / ontology_tern

TERN Ontology
https://linkeddata.tern.org.au/viewers/tern-ontology
Creative Commons Attribution 4.0 International
6 stars 4 forks source link

Observation missing spatially indicator requirement #59

Closed nicholascar closed 2 years ago

nicholascar commented 2 years ago

tern:Observation instances must have temporality indicated by both a sosa:phenomenonTime & sosa:resultTime property but seem not to require a spatiality indicator.

They are allowed to have one, via geo:hasGeometry but why is this not a requirement?

They are allowed to be linked to a Site via a SiteVisit but this is not required.

If spatiality of a tern:Observation instances is either to be given or to be inferred, perhaps by linking to a tern:Site then either or both should be required, not neither required.

Gaia and SURROUNd expect many Observations to not have explicit SiteVisit or even Site information so it doesn't seem wise to require an Observation → SiteVisit → Site chain, so we suggest that spatiality for an Observation should be mandated but not only via geo:hasGeometry but via any other spatial relations property, e.g. the Observation is geo:sfWithin SomeNamedLocation (i.e. not a Site).

dr-shorthair commented 2 years ago

Hmm. I would push them on this. The notion of Site and SiteVisit is pretty central to the TERN model. It is a better model than current practice, which elides the path to a single relation to a point location.

There has to be a step up to a better model at some time. Can't that time be now?

nicholascar commented 2 years ago

Not for existing data which also has to be loaded into the BDR! How do you propose to require SiteVisit & Site info for 100 of 1,000s of existing Observations that don't have them?

Going forward, sure... but even then, it will take years to get such practice embedded.

Since the BDR will maintain a Site register, we can intersect Obserations, if they have spatiality, with Sites to approximate Observation → [ ] → Site

dr-shorthair commented 2 years ago

If the Observation (and Sampling) has a time, and a location, then a Site and Site-visit can be inferred.

edmondchuc commented 2 years ago

They are allowed to have one, via geo:hasGeometry but why is this not a requirement?

Observations in some datasets do not have spatial information for their observations.

They are allowed to be linked to a Site via a SiteVisit but this is not required.

Previously, this was a requirement when the TERN Ontology only supported site-based surveys (which most surveys are). The requirement has since been removed to support opportunistic survey protocols.

Currently, sites may have geo:sfWithin. It may be useful to add geo:sfWithin to Observation and Sampling to support the capture of relations to spatial features for opportunistic protocols.

not only via geo:hasGeometry but via any other spatial relations property

I think this can be supported.

dr-shorthair commented 2 years ago

An 'opportunistic survey' can be described by inferring a (informal) 'Site' at the location of the observation, and an associated 'Site visit' at the time of the observations. That way we can stick with the TERN model.

Is this too big of a stretch?

nicholascar commented 2 years ago

I think my original point remains: what is missing is a requirement for an Observation to be spatially located. There are clearly multiple ways of achieving this - Observation with a geometry, Observation with a relation to a Feature, Observation linked to a Site - but AT LEAST ONE must be present for every Observation.

This can be expressed as a business rule easily - essentially as per above sentence - but whether a single or set of Shapes can express it well, I leave to you - TERN Ont authors - to figure out!

I think we can add a non-Shape Requirement for this, until/unless you can produce a Shape for it, to the on-technical NDES Specification: https://github.com/surroundaustralia/ndes/blob/master/specification.adoc

dr-shorthair commented 2 years ago

And I think the core of my response remains.

In the TERN model spatial location is primarily associated with a Site. Observations and Samplings occur in the context of a Site Visit, so the location of an Observation is available from the associated Site.

The Site and Site Visit can be inferred to exist, even if historically they haven't been formally recorded in that way. The information for describing a site and site visit doesn't include anything that isn't already in the data.

But having a single standard model for describing this whole scenario, as well as adjacent scenarios, makes querying easier.

dr-shorthair commented 2 years ago

Note that separation of location from observation via a 'feature of interest' was core to O&M, starting from ca. 2002. One of the requirements then was to have a common model for in-situ, remote, and ex-situ observations, where the location of the sensor and the location of the point of interest in the world may not coincide. This led to the pattern of having the observation associated with a 'feature of interest' which has location, and this has proved to be a strong pattern with utility across a variety of applications. Sticking a location or geometry on the observation itself would undermine that progress and interoperability with related domains.

A second way of looking at it is that while denormalized records have often been convenient for data transfer, they inevitably result in repetition of the same information in multiple records. Recognising that repetition and promoting it to separate objects is the strategy.

We know what those object types are from the O&M and SSN theory, that is quite mature now, and implemented in the TERN ontology.

nicholascar commented 2 years ago

OK, as per above and the NDES chat we just had, perhaps we have to reorient the basic unit of reporting to the BDR from Observation to Sampling. Then we can insist that a Sampling has a location, even if the multiple Observation instances associated with it don't??

I will then need a SamplingCollection but otherwise I think most things can stay teh same, s/Observation/Sampling/...

dr-shorthair commented 2 years ago

The boundary between act-of-sampling and act-of-observation can be grey.

For example, the result of an Observation can be an image - this was one of the classic cases. However, the image can be the feature-of-interest of a subsequent observation which does some feature-extraction etc. In which case the initial observation could be understood as an Act of Sampling, which created a 'sample' of the field of view, made by the Sensor acting as a Sampler.

We even have a comment about it somewhere in the spec, but I can't find that now (Edmond found it!)

nicholascar commented 2 years ago

It may be more about what we communicate is the primary thing in the BDR, so the Biodiversity Data Repository is full of records of sampling, and their results, and observations on those sampling results and their, further, results...

It's not going to matter too much to us ontology users but it might make it easier for data contributors to the BDR to start off: I expect to be supplying instances of Sampling, and all the things that can be linked to from there, rather than starting with Observation.

I think the initial focus on Observation may be due to a naive interpretation of the NDES spec early in the project and we've just kept circling around Observation as the entry point ever since. This may have been an error on my part in communicating SOSA...

dr-shorthair commented 2 years ago

Yes. The more I work on this the more important I think the sampling step is. In fact it is probably where the most science innovation happens.

edmondchuc commented 2 years ago

From previous meeting notes with @dr-shorthair.

  • Modelling deployment information of TERN phenology cameras (phenocams) within the TERN Ontology
    • Phenocams are Samplers in our case, not Sensors.
    • Samplers produce samples
    • Sensors produce observations
    • What do we consider an image produced by the phenocam?
    • We think it fits well with the shape of Sampler, Sample and further Observations on Features of Interest.

For TERN, it made sense to model it with act-of-sampling and sample because the result (image) becomes the feature-of-interest of further observations. These observations calculate the chromatic indices of the image to analyse its green-ness and its correlation to photosynthesis.

Most of the Darwin Core occurrence records are also samples of act-of-samplings and these samples become the feature-of-interest of observations for life stage, taxon, type status, etc.

edmondchuc commented 2 years ago

@nicholascar regarding https://github.com/ternaustralia/ontology_tern/issues/59#issuecomment-987418885, I am just confirming that the action on this issue for TERN is to create a single shape or a set of shapes to require Observation and Sampling activities to have spatial information either on the activities themselves or via the Site or an intermediary FeatureOfInterest. Is this correct?

If so, can I ask for some guidance on how you would constrain this rule over multiple target classes in SHACL?

Can I also ask what you require in the model to allow for:

so we suggest that spatiality for an Observation should be mandated but not only via geo:hasGeometry but via any other spatial relations property

Did you want an optional property in the Observation and Sampling shapes to indicate that using other spatial relationships such as geo:sfWithin is possible?

dr-shorthair commented 2 years ago

reorient the basic unit of reporting to the BDR from Observation to Sampling

Yes.

miekeGR commented 2 years ago

I don't fully understand what is being proposed here, but I would like to bring up an example that you might want to consider before making spatiality on an observation mandatory (that's if that's what is happening). If the observation is a taxonomic identification on a Sample, it doesn't really matter where it is being done (might be far away from the collection Site, back in the lab at a museum for example); the Sampling to create the Sample let's you know the important spatiality of the Sample.

dr-shorthair commented 2 years ago

What is being proposed is that

(a) an act of sampling has (i) a feature-of-interest, and (ii) (optionally) a sampling-location.

The Foi usually has spatiality.

If the act of Sampling also has a more specific sampling location, it must necessarily be contained within the geometry of the FoI

(b) an act of observation has (i) a feature of interest, and (ii) (usually) a sensor.

The FoI usually has spatiality. The (proximate) FoI of an observation is often a Sample, which was the result of an act of Sampling (see (a)).

The Sensor may also have a location, which is only the same as the FoI for in-situ observations. The Sensor may be in a lab, which would usually be at a different location (and may not be interesting).

So the act-of-observation does not have a direct property related to spatiality. The 'locations' are related to the Sample, the FoI, and the Sensor, which may all be different depending on the observational scenario.

nicholascar commented 2 years ago

I am just confirming that the action on this issue for TERN is to create a single shape or a set of shapes to require Observation and Sampling activities to have spatial information either on the activities themselves or via the Site or an intermediary FeatureOfInterest

Yes

how you would constrain this rule over multiple target classes in SHACL?

I think it's just a SHACL or statement, like the or used for the time expression in the TERN ontology's use of time:Instant where you must have at least one way of communicating the value and potentially multiple but not duplicate property use.

So you could have a Sampling that has properties for indicating that it is both geo:sfWithin the Feature BuckingBongStateForrest (or SiteX or CAPADAreaY) and that it has a geo:hasGeometry, but you must have at least one. You cannot have two geometries though. It could be within multiple features (geo:sfWithin BuckingBongStateForrest & NSW).

The SHACL or just points to sh:property shapes which may point to more property shapes...

Did you want an optional property in the Observation and Sampling shapes to indicate that using other spatial relationships such as geo:sfWithin is possible?

I think if you just require that a Sampling either has a geometry, or a Feature/Feature relation to another Feature that has a geometry (this could be a path of geo:sfWithin+) then that's fine. No need to deal with other Feature/Feature links than geo:sfWithin (yeah, geo:sfOverlaps ect could work but letn's not boil the ocean here).

Perhaps the only trick is that where a Sampling can acquire a geometry from a chain of any number of geo:sfWithin links to a Feature that has a geometry, you will have to cater for the SOSA Activity/Feature links initially, so your chain will need to start with sosa:hasFeatureOfInterest.

edmondchuc commented 2 years ago

Hi @nicholascar and @dr-shorthair, I've created a pull request which adds shapes to ensure there is at least 1 geo:hasGeometry or geo:sfWithin on the activity itself or along its chain of sosa:isSampleOf relationships via its sosa:hasFeatureOfInterest.

Please have a review of https://github.com/ternaustralia/ontology_tern/pull/170 if you have time. I will merge this in by the end of the week and make a minor release. Cheers.

dr-shorthair commented 2 years ago

Sorry I missed this request earlier. Nice.