opengeospatial / sensorthings

The official web site of the OGC SensorThings API standard specification.
132 stars 28 forks source link

Sensor Operating Frequency #175

Open humaidkidwai opened 5 months ago

humaidkidwai commented 5 months ago

The data model does not specify any mandatory frequency attribute in the Sensor entity.

In order to find out the frequency at which the sensor records observations, the user may need to compare the phenomenonTime between successive Observations. However, if the the sensor is configurable and its frequency is changed often, this method of comparing phenomenonTimes also fails.

Users might want to aggregate Sensors from different STA deployments that operate on the same frequency and measure the same ObservedProperty.

If such an attribute is made mandatory in the Sensor entity, it might be also worth considering that every update to the frequency attribute should result in a new Datastream. As Datastream is essentially a timeseries, it is important to have all the Observations of the Datastream be recorded at the same frequency

hylkevds commented 5 months ago

It can't be mandatory, since there are plenty of use cases with infrequent Observations, like in the Water Quality domain. Unless we have an option for "indeterminate frequency". There are also cases where the frequency is just a minimum, and the sensor sends an Observation when the ObservedProperty changes enough, or when a certain time has passed.

This also ties in with aggregation, where one generates Observations in a new Datastream, with a different frequency than the original. For that we'd also need to store the aggregation method (average, minimum, maximum)

humaidkidwai commented 5 months ago

This also ties in with aggregation, where one generates Observations in a new Datastream, with a different frequency than the original. For that we'd also need to store the aggregation method (average, minimum, maximum)

Could you explain this part? You mean if a user needs to store aggregate observations per time period N, they would create a new Datastream?

hylkevds commented 5 months ago

Yes, aggregations go in a different Datastream from the originals. This MultiDatastream holds daily aggregate Observations of this Datastream.

Mixing aggregate Observations with the originals only complicates things, since then you have to figure out how to $filter to get just the aggregates or just the originals. The set of aggregate Observations is a time series in itself, separate from, but related to, the original time series.

The sensor for the aggregate time series should also be the aggregation algorithm, since there are many ways to create aggregates. We didn't do that in the above example though, that's on my todo list :)

humaidkidwai commented 5 months ago

The set of aggregate Observations is a time series in itself, separate from, but related to, the original time series.

The sensor for the aggregate time series should also be the aggregation algorithm

I see. The aggregate Observations then would generally be aggregated by the user through a process that runs independently of the Sensor which reports the individual readings right?

It can't be mandatory, since there are plenty of use cases with infrequent Observations, like in the Water Quality domain. Unless we have an option for "indeterminate frequency".

The indeterminate frequency is alright, it still gives off some information to the user. The frequency attribute will also be helpful towards storage optimizations (using partitioning techniques for cloud native file formats)

hylkevds commented 5 months ago

There is also the issue of sensors that can change frequency. At any point in time they have a specific measurement frequency, but this frequency can be changed by the user. This is very common for LoRa sensors.

The problem with adding specific system properties (like frequency) is that there are so many of them: https://w3c.github.io/sdw-sosa-ssn/ssn/#overview-and-examples If we add frequency, we'll next get the question why we didn't and [any of the others] I wonder if this may be better handled by something like the data quality extension: #174