opencdms-project / project-standards-team

👥🗎 This repository is the main collaborative space for the OpenCDMS Data Model Standard Working Group (DMS-WG)
1 stars 2 forks source link

Date period (duration over which an observation was made) #11

Open isedwards opened 3 years ago

isedwards commented 3 years ago

In Climsoft, the period over which an observation was made can be recorded in the period field of the observationfinal table.

In practice the element code is used to determine both the element/parameter being measured and also the period of the observation (with the observation time occuring at the end of the observation period). For example, element code 2 TMPMAX is defined as the daily maximum temperature.

The observation date for a monthly observation is recorded at the end of the observation period. Therefore is should not be necessarily to store partial dates (e.g. 2020-11). In Climsoft metadata tables, fields like station opening/closing dates accept text input since the full date may not be known.

Default time to nominal time for observation. 0600 UTC (Kenya), 0900 UTC (UK)

Examples for monthly aggregation for 2020-11

Open and closed date ranges.

In CliDE the period is determined by the table that is used to store the observation, e.g. obs_daily.

Related comment from @Steve-Palmer

There is a linked issue about the exact definition of the datetime attached to an observation, especially for values over a period (rainfall, max and min temperature etc). Midas and Climsoft define these as the time of observation, with the period extending backwards from that time. CLICOM used the traditional "cast-back" date for daily rainfall and max temperature, but not min temperature. WIGOS metadata allows for multiple definitions including the start of period and mid-period.

Steve-Palmer commented 3 years ago

The problem with specifying a table with one entry for every expected occurrence of a periodic observation is the difficulty of coping with missing values. The CliDE obs_daily structure seems to follow CLICOM (and earlier Met Office CliMaster and probably CliData) in defining one entity for each occurrence expected in the sequence (i.e. 31 daily values in a month). If some of those values are accumulations over a longer period, as would happen at weekends in a 5-day manual station, then the missing values must be estimates, and the original observations are difficult to use. When extracting (say) a monthly total from a weekdays-only station, using the 31 daily values would result in an estimated total. Using the weekend accumulations would result in a measured total unless one of the weekend periods also spanned a weekend. This reduces the number of estimated values in a year from every month to about 4 months each year (in 2020 just February and March for rain and max temperature; Feb, March, May and June for min temperature).

I have long argued that the Climsoft Element definitions should not include words like "Daily" because this is unnecessary. Use of the Period attribute means that the TMPMAX element definition can be used for daily values, multiple days such as weekends, or for the 12-hourly values in the synoptic code.

jaggh commented 3 years ago
Just to add another piece of
  information, after defining a variable in MCH, e.g., 'watertemp',
  the administrator must click on a button to create the MySQL
  tables for the new variable. These tables are named after the
  variable, prepending two letters to indicate different time
  frequencies as, in our example, dmwatertemp (monthly), ddwatertemp
  (daily) and dtwatertemp (sub-daily). These would be the most
  useful tables from my point of view, although MCH creates also
  another type of daily tables plus tables for weekly and 10-days
  values.
jaggh commented 3 years ago

Both in MCH and the Spanish (AEMET) database, monthly dates are assigned to the first day of the month. In our Noverber 2020 example: 2020-11-01

Steve-Palmer commented 3 years ago

In MIDAS and Climsoft, all periodic observations are assigned to the time of observation. For monthly observations, (such as the traditional monthly raingauge which used a large cylindrical can to collect the rain and was read with a measuring stick), this would be read (in UK practice) at (nominal) 0900 UTC on the first of each month. Therefore the reading in the database for 1 January 0900 UTC would have the monthly total for December. This means that "casting back" the reading to the applicable month has to be done in the extraction query process. This adds complexity to the extraction query, but significantly reduces the chance of errors during the recording and storing processes, especially for paper forms. It also means that the measurement can be recorded even if the date of reading is not exact (for example 1 January is a Public Holiday in the UK, so the actual reading may be done on 2 January), and an estimated value can be added without hiding the original reading.

mhabimana commented 3 years ago

Both in MCH and the Spanish (AEMET) database, monthly dates are assigned to the first day of the month. In our Noverber 2020 example: 2020-11-01

@jaggh - Do you mean monthly dates or monthly data?

jaggh commented 3 years ago
Hi, I meant monthly dates. I was not talking about to which date
  are data assigned; I was explaining that monthly data of, e.g.,
  February 2020, had assigned the date 2020-02-01 in the monthly
  data table. (In a daily data table, it would indicate that the
  data are for February the first, but in the monthly table it just
  indicate data for February).
    Jose
mhabimana commented 3 years ago

@jaggh -Thanks,it is clear now.

DanHollis commented 3 years ago

@Steve-Palmer drew my attention to this repo. I work at the UK Met Office and am a user of MIDAS.

I'm not familiar with previous discussions you may have had, so apologies if some of this is not relevant. Some initial thoughts:

Storing values against the observation time is definitely best from a traceability and accuracy point of view - it reflects what actually happened.

In the daily rainfall table in MIDAS there is a 'day count' column. Accumulations have a day count > 1. Could you generalise this approach to variables with other periods? e.g. an hour count for hourly data or a month count for monthly data. This would give maximum flexibility and a uniform approach.

Careful thought would need to be given to gauges that are read monthly. If the reading was made a day late then you would want this to be accurately reflected in the database e.g. day count = 32 rather than month count = 1.

Storing the data against observation time and combining this with a count column does require the user to understand the intricacies of the reporting process though e.g. in the UK the max and min temps for a given calendar day come from different 24-hour periods. To try and solve this (and reduce mistakes) we implemented a second database schema that applies business rules to the raw observations. It returns the 'best available value' for a given variable and calendar date. Could you take this approach?

Creating monthly totals when there are accumulations is tricky. To some degree you can only work with what you've got i.e. if an accumulation spans the month end then it's impossible to obtain a genuine monthly total. The value should be stored (or returned, if this is happening on-the-fly) with a non-zero missing day count. My feeling is it would be wasted effort to try and come up with business rules to do anything more sophisticated. It should be up to the data owners to reapportion the accumulation into daily values as part of a QC process and then flag the entries accordingly.