metoppv / improver

IMPROVER is a library of algorithms for meteorological post-processing.
http://improver.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
102 stars 85 forks source link

Converter to allow spot forecast outputs to be read into SBV #204

Closed jflowerdew closed 6 years ago

jflowerdew commented 7 years ago

As an IMPROVER scientist, I want to be able to store spot forecast outputs in a Station-Based Verification system (SBV) database, so that I can measure their performance using standard scores.

Related issues: #203

Acceptance criteria:

For now, it is assumed the converter only needs to handle the forecasts, with matching to observations handled in a separate database script as in the operational system.

Acceptance criteria:

Update for October sprint We are now implementing deterministic spot verification only this financial year; ensemble support is postponed to FY18/19.

bayliffe commented 7 years ago

As a verification scientist I would like IMPROVER to produce SpotData in a format that can be readily ingested into a verification database for alignment with observations. This must be conducted without use of the Oracle database, with uncertainties about the ability of SQLite to handle very large table manipulations.

The tool as envisaged will let spot NetCDF files accumulate until all lead times for a given validity time are available. A cube will then be constructed that contains all lead times for a days worth of validity times in one numpy array. The structure of this cube will be something like:

Metadata: Date, site, diagnostic

Array:
Validity Time | Lead Time | Diagnostic Value

Such cubes can be constructed for each day. As the cubes are already structured in the way needed to the database, they can be combined with a simple join statement, rather than trying to manipulate a very large table to achieve the same effect.

LaurenceBeard commented 7 years ago

To clarify, an SQL table for deterministic data (being two dimensional) currently has the form:

Validity Date Validity Time Station ID Diagnostic Ob T+1 T+2 T+3 T+4..

While a probabilistic/ensemble table may have the form:

Validity Date Validity Time Station ID Diagnostic FCR Ob M1 M2 M3..

(in both cases a the diagnostic column stands in place of the model code columns which will need to be mapped, and other columns will require inclusion as per the current form of tables which VerPy expects)

Where the separation of concerns lies, between IMPROVER output and verification input is the issue which requires consideration, it seems.

jflowerdew commented 7 years ago

Original text (preserving background information and alternative options developers could argue for if appropriate):

As an IMPROVER scientist, I want to be able to store spot forecast outputs in a Station-Based Verification system (SBV) database, so that I can measure their performance using standard scores.

Related issues: #203

For trials, we anticipate the SBV using a local SQLite database; the later operational system will use the corporate Oracle database. IMPROVER writes spot forecasts in NetCDF, whilst database ingestion requires either Comma-Separated Values, or direct writing to tables using a tool such as pandas.

Acceptance criteria:

For now, it is assumed the converter only needs to handle the forecasts, with matching to observations handled in a separate database script as in the operational system.

fionaRust commented 6 years ago

PR: #311

bayliffe commented 6 years ago

Completed.