Converter to allow spot forecast outputs to be read into SBV

jflowerdew commented 7 years ago

As an IMPROVER scientist, I want to be able to store spot forecast outputs in a Station-Based Verification system (SBV) database, so that I can measure their performance using standard scores.

Related issues: #203

Acceptance criteria:

Write a program to convert IMPROVER NetCDF deterministic ~~and ensemble~~ spot forecast outputs to Comma-Separated Values (CSV) which a separate script could then ingest into the database. ~~You can assume all ensemble forecasts will have been converted back to ensemble members if necessary before the converter is called.~~
Demonstrate ability to get one deterministic ~~and one ensemble~~ forecast into a sample database, with an acceptable implied execution time per cycle. Precise unit testing requirements to be agreed with Clare Bysouth (to provide reasonable confidence of successful integration into a suite in September).
Code and interface to be reviewed by Clare Bysouth in addition to standard Scrum team technical review.
Since the converter adapts generic NetCDF to the specific requirements of VER, it will be stored in the VER repository with Laurence Beard's IMPROVER-NetCDF->VER-fieldsfile converter for gridded forecasts. This requires a high-level acceptance review by Anette Van Der Wal.
At a minimum, the converter should be tested for wind speed. Where possible, it should anticipate or be tested against other common variables, eg those required for the PWS customer metrics (temperature, including daily maximum/minimum, wind speed, wind direction; weather symbols may have to wait until these outputs are more fully defined).

For now, it is assumed the converter only needs to handle the forecasts, with matching to observations handled in a separate database script as in the operational system.

Acceptance criteria:

Write an IMPROVER CLI to convert IMPROVER NetCDF deterministic and ensemble spot forecast ensemble member outputs to a SQLite database.
CLI acceptance tests
Table layout to be agreed with Clare Bysouth and Jonathan Flowerdew.
Demonstrate ability to get one deterministic and one ensemble forecast for wind speed into a sample database
Report estimated execution time of task
- Code and interface to be reviewed by Clare Bysouth in addition to standard Scrum team technical review.
- If time allows, test other diagnostics such as surface temperature and wind direction.
- Preserve netCDF data type in output table.

Update for October sprint We are now implementing deterministic spot verification only this financial year; ensemble support is postponed to FY18/19.

bayliffe commented 7 years ago

As a verification scientist I would like IMPROVER to produce SpotData in a format that can be readily ingested into a verification database for alignment with observations. This must be conducted without use of the Oracle database, with uncertainties about the ability of SQLite to handle very large table manipulations.

The tool as envisaged will let spot NetCDF files accumulate until all lead times for a given validity time are available. A cube will then be constructed that contains all lead times for a days worth of validity times in one numpy array. The structure of this cube will be something like:

Metadata: Date, site, diagnostic

Array:
Validity Time | Lead Time | Diagnostic Value

Such cubes can be constructed for each day. As the cubes are already structured in the way needed to the database, they can be combined with a simple join statement, rather than trying to manipulate a very large table to achieve the same effect.

LaurenceBeard commented 7 years ago

To clarify, an SQL table for deterministic data (being two dimensional) currently has the form:

Validity Date	Validity Time	Station ID	Diagnostic	Ob	T+1	T+2	T+3	T+4..

While a probabilistic/ensemble table may have the form:

Validity Date	Validity Time	Station ID	Diagnostic	FCR	Ob	M1	M2	M3..

(in both cases a the diagnostic column stands in place of the model code columns which will need to be mapped, and other columns will require inclusion as per the current form of tables which VerPy expects)

Where the separation of concerns lies, between IMPROVER output and verification input is the issue which requires consideration, it seems.

jflowerdew commented 7 years ago

Original text (preserving background information and alternative options developers could argue for if appropriate):

As an IMPROVER scientist, I want to be able to store spot forecast outputs in a Station-Based Verification system (SBV) database, so that I can measure their performance using standard scores.

Related issues: #203

For trials, we anticipate the SBV using a local SQLite database; the later operational system will use the corporate Oracle database. IMPROVER writes spot forecasts in NetCDF, whilst database ingestion requires either Comma-Separated Values, or direct writing to tables using a tool such as pandas.

Acceptance criteria:

Write a program to convert IMPROVER NetCDF deterministic and ensemble spot forecast outputs to either Comma-Separated Values (CSV, which a separate script could then ingest into the database), or to directly write the data into the database using a tool such as pandas. You can assume all ensemble forecasts will have been converted back to ensemble members if necessary before the converter is called.
Table layout to be agreed with Clare Bysouth and Jonathan Flowerdew. For deterministic forecasts, this should fit the conventions required to calculate the PWS Customer metrics.
Demonstrate ability to get one deterministic and one ensemble forecast into a sample database.
Depending on the degree of code overlap, this program could be an extension of Laurence Beard's IMPROVER-NetCDF->VER-fieldsfile converter for gridded forecasts (stored in the VER repository and subject to review by Anette Van Der Wal), or a separate program to be stored in an agreed repository and subject to it's review procedures. If stored in VER/VerPy/etc, this ticket only requires reviewed code on a branch consistent with other code being used with IMPROVER, not necessarily merged to the trunk.
At a minimum, the converter should be tested for wind speed. Where possible, it should anticipate or be tested against other common variables, eg those required for the PWS customer metrics (temperature, including daily maximum/minimum, wind speed, wind direction; weather symbols may have to wait until these outputs are more fully defined).

For now, it is assumed the converter only needs to handle the forecasts, with matching to observations handled in a separate database script as in the operational system.

fionaRust commented 6 years ago

PR: #311

bayliffe commented 6 years ago

Completed.

metoppv / improver

Converter to allow spot forecast outputs to be read into SBV #204