opencdms-project / was-wg

🗎 This repository is the main collaborative space for the OpenCDMS Web API Standard Working Group (WAS-WG)
https://www.opencdms.org/approach/groups/working-group/was-wg/
1 stars 3 forks source link

Document conceptual architecture of an observation implications for API design #5

Open isedwards opened 4 years ago

isedwards commented 4 years ago

conceptual_architecture_obs

The boxes in yellow colour may accept several values for the same observation. For example you may keep in the database several values of the same observation e.g. value has been modified after data control. Moreover, the user interface should allow requests in that way:

Of course, for our API, the systems that could not answer to some of the requests may say "I cannot give you this, but I can give you this and this and this ..."

Some more information/remark :

jaggh commented 4 years ago

From a practical point of view I see some of these requirements a bit unnecessary. I mean that if I want a wind rose from some place, I can know in advance, from the station metadata, the details about its observing conditions and compliance with WMO regulations. If I am very strict and the site does not comply with all the requirements, the result will be that I will not get any wind rose, so I would probably want the wind rose with whatever data are available at the site, and afterwards I will be able to assess its reliability from the metadata.

isedwards commented 4 years ago

Related comment from @martin_schweitzer

Having had some time to think and also having had to do some work over the last fortnight with the Australian climate data, I would like to make two observations.

  1. I am still in favour of defining an API and think this is the best way forward.
  2. I think that an API is just one part of achieving interoperability. An API exists mostly at the technical level. I will explain what I mean below.

Suppose we are considering daily temperature. We have an API that allows us to query the daily temperature at a given station for a given month.

We also need to consider:

  1. What quality control, if any, was applied.
  2. For our purpose, does the qc meet our needs.
  3. If we say we want a quality flag of 4 or less, does this mean the same thing across applications/data models.
  4. Assuming all temperatures are in °C, was the original temperature recorded in °C.
  5. Are the measurements 09:00 to 09:00 or some other period.
  6. Is the temperature measured or the result of an aggregation (e.g. from 1-minute data).
  7. If it is an aggregation, what are the rules - e.g. with respect to missing values.
  8. If there is a missing value, can we distinguish whether it is a null or just failing qc.

One approach would be to provide all this information in the API. But then we may get an 'C' for a quality flag in one application and a '3' in another.

Also, it would probably make the API unecessarily unwieldly and complex.

The other approach would be to have a common understanding (and set of rules) for the meaning of the data, so that, for example, a quality flag of '3' means the same thing across systems. If this is what people are referring to when they speak about 'Data Model', then I am in agreement that we need to define both an API and a data model.

and @Steve-Palmer's reply

Martin makes a really useful point. My understanding (which may be wrong) is that some of these uncertainties are being addressed in the WIGOS metadata model work, particularly on consistency of flagging. At some point soon, I need to start looking at this work, because the focus of the WIGOS work is primarily on the near-real-time exchange of data, and our focus is on the long-term climatology, so there may be aspects where we can identify gaps in the WIGOS metadata.

One issue I am already aware of is that WIGOS allows periodic observation elements (rain accumulation, max and min temperature etc) to be defined optionally by the start of the period or by the end of the period as well as the length of the period – I have argued for the last 25 years that for climate purposes, it is better to emphasise the end of the period being the actual time the observation is made, and the length of the period is the time since the previous observation. In this case, the WIGOS definition allows both, but only one method is preferable for practical climatology.

This example is based on experience. In the Met Office Climaster database used up to 1997, daily rainfall and max temperature were “cast back” and stored by the start of the period, and at the same time, estimates were inserted for when the period was not a day (especially weekends) (CLICOM usually worked this way too). This meant that any monthly total using observations from a “working days only” station was produced as an estimate. In the Met Office MIDAS database, storing by observation time, and recording the period correctly allowed measured readings except in those months when the end of the month was during a weekend – a small but useful improvement in service delivery.