wmo-im / GRIB2

GRIB2
MIT License
21 stars 9 forks source link

New Product Definition Templates to generalize the percentiles (100-quantiles) forecast Templates to any q-quantiles #53

Closed sebvi closed 3 years ago

sebvi commented 3 years ago

Branch

https://github.com/wmo-im/GRIB2/tree/issue-53

Summary and purpose

This document proposes new templates to generalize percentile forecasts to a partitioning of any size called “quantile”, percentiles being 100-quantiles.

Action proposed

The team is requested to approve the content of this proposal for inclusion with the next update of the WMO Manual on Codes.

Discussions

At ECMWF a new experimental global post-processed product, called “ecPoint-rainfall”, was introduced in April 2019 into the suite of real-time forecast products available for use by forecasters worldwide. The methods and products have attracted a great deal of interest, with many national met services and commercial customers requesting access to the GRIB data. ecPoint has also been the subject of many presentations (e.g. see https://youtu.be/QfZW34we2u8) and short articles in print and online (e.g. https://www.harry-otten-prize.org/news.html, scroll to “21 September 2019”). A more comprehensive paper (“A new low-cost technique improves weather forecasts across the world”) was submitted to Nature and is currently in the second review round (preprint available here: https://arxiv.org/abs/2003.14397). Meanwhile further work is underway to develop related products from extended range forecasts and from the ERA5 re-analysis, post-processing 2m temperature (ecPoint-temperature) as well as rainfall; for these we expect a broad user base to develop. Furthermore, ecPoint products can be generated from any global model or ensemble. We would thus like to archive the new ecPoint data type, but due to the “unusual” format current GRIB definitions do not permit this. A particular limitation is that the most quantiles one can store, for a probabilistic product, is 101 (i.e percentiles using the percentile Forecasts templates). We would like to extend to embrace the “permille” concept, whereby one can store 1001 quantiles (i.e. equal to 0, 0.1, 0.2, 0.3, …, 99.8, 99.9, 100% stored in quantile as 0, 1 ,2 ,3, …, 998 999, 1000). Extending in this way provides the user with much more information on the distribution tails, which is where much of the value of ecPoint output lies, particularly for anticipating extreme events, such as extreme localised rainfall that can lead to devastating flash floods.

Detailed proposal

To generalize the concept of percentile to any partitioning called of any size called “quantile”, we propose to introduce 2 new templates based on the existing percentile templates 4.6 and 4.10. Please note that a quantile is now encoded on 2 octets to allow the encoding of “permille” (1000-quantile).

Product definition template 4.86 - Quantile forecasts at a horizontal level or in a horizontal layer at a point in time.

Octet No. Contents 10 Parameter category (see Code table 4.1) 11 Parameter number (see Code table 4.2) 12 Type of generating process (see Code table 4.3) 13 Background generating process identifier (defined by originating centre) 14 Forecast generating process identifier (defined by originating centre) 15–16 Hours after reference time of data cut-off (see Note) 17 Minutes after reference time of data cut-off 18 Indicator of unit of time range (see Code table 4.4) 19–22 Forecast time in units defined by octet 18 23 Type of first fixed surface (see Code table 4.5) 24 Scale factor of first fixed surface 25–26 Scaled value of first fixed surface 29 Type of second fixed surface (see Code table 4.5) 30 Scale factor of second fixed surface 31–34 Scaled value of second fixed surface 35-36 Total number of Quantiles q 37-38 Quantile value (between 0 and q )

Note: Hours greater than 65534 will be coded as 65534.

Product definition template 4.87 – Quantile forecasts at a horizontal level or in a horizontal layer in a continuous or non-continuous time interval

Octet No. Contents 10 Parameter category (see Code table 4.1) 11 Parameter number (see Code table 4.2) 12 Type of generating process (see Code table 4.3) 13 Background generating process identifier (defined by originating centre) 14 Forecast generating process identifier (defined by originating centre) 15–16 Hours after reference time of data cut-off (see Note) 17 Minutes after reference time of data cut-off 18 Indicator of unit of time range (see Code table 4.4) 19–22 Forecast time in units defined by octet 18 23 Type of first fixed surface (see Code table 4.5) 24 Scale factor of first fixed surface 25–26 Scaled value of first fixed surface 29 Type of second fixed surface (see Code table 4.5) 30 Scale factor of second fixed surface 31–34 Scaled value of second fixed surface 35-36 Total number of Quantiles q 37-38 Quantile value (between 0 and q ) 39–40 Year of end of overall time interval 41 Month of end of overall time interval 42 Day of end of overall time interval 43 Hour of end of overall time interval 44 Minute of end of overall time interval 45 Second of end of overall time interval 46 n – number of time range specifications describing the time intervals used to calculate the statistically processed field 47-50 Total number of data values missing in the statistical process 51–62 Specification of the outermost (or only) time range over which statistical processing is done 51 Statistical process used to calculate the processed field from the field at each time increment during the time range (see Code table 4.10) 52 Type of time increment between successive fields used in the statistical processing (see Code table 4.11) 53 Indicator of unit of time for time range over which statistical processing is done (see Code table 4.4) 54-57 Length of the time range over which statistical processing is done, in units defined by the previous octet 58 Indicator of unit of time for the increment between the successive fields used (see Code table 4.4) 59-62 Time increment between successive fields, in units defined by the previous octet (see Note 3) 63–nn These octets are included only if n > 1, where nn = 50 + 12 x n 63–74 As octets 51–62, next innermost step of processing 75–nn Additional time range specifications, included in accordance with the value of n. Contents as octets 51 to 62, repeated as necessary.

Notes: (1) Hours greater than 65534 will be coded as 65534. (2) The reference time in section 1 and the forecast time together define the beginning of the overall time interval. (3) An increment of zero means that the statistical processing is the result of a continuous (or near-continuous) process, not the processing of a number of discrete samples. Examples of such continuous processes are the temperatures measured by analogue maximum and minimum thermometers or thermographs, and the rainfall measured by raingauge.

These new templates should be properly referenced in Code table 4.0 Octet No. Meaning 86 Quantile forecasts at a horizontal level or in a horizontal layer at a point in time 87 Quantile forecasts at a horizontal level or in a horizontal layer in a continuous or non-continuous time interval

sebvi commented 3 years ago

added a tentative commit for this branch, please check.

ON a side note I noticed that template 4.10 is still tagged "experimental" in the csv file. Anyone knows why?

jitsukoh commented 3 years ago

@sebvi could you provide sample data using this template and a decode output to complete the validation process? I understand that the use of products encoded using this new template will be limited to users of ecCodes at this moment and therefore there is no reason not to approve it because there is no other decoders available. Please comment if there is objection.

C.f. 7.3 Testing with relevant applications For changes that have an impact on automated processing systems, the extent of the testing required before validation should be decided by the designated committee on a case-by-case basis, depending on the nature of the change. Changes involving a relatively high risk and/or impact on the systems should be tested by the use of at least two independently developed tool sets and two independent centres. In that case, results should be made available to the designated committee with a view to verifying the technical specifications.

ON a side note I noticed that template 4.10 is still tagged "experimental" in the csv file. Anyone knows why?

When GRIB2 was introduced, not all templates were validated because there were no users, and at a later stage, these templates were flagged as “experimental” (there is a note under each template saying “Preliminary note: This template was not validated at the time of publication and should be used with caution. Please report any use to the WMO Secretariat (Observing and Information Systems Department) to assist for validation.”) and this is indicated as "experimental" in the status column of computer-readable tables. PDT 4.10 is one of these templates.

sebvi commented 3 years ago

Dear Jitsuko,

I will upload a sample tomorrow. I apologize for the delay, I have been quite busy.

Sebastien

On 18 Jan 2021, at 05:27, Jitsuko Hasegawa notifications@github.com wrote:

 @sebvi could you provide sample data using this template and a decode output to complete the validation process? I understand that the use of products encoded using this new template will be limited to users of ecCodes at this moment and therefore there is no reason not to approve it because there is no other decoders available. Please comment if there is objection.

C.f. 7.3 Testing with relevant applications For changes that have an impact on automated processing systems, the extent of the testing required before validation should be decided by the designated committee on a case-by-case basis, depending on the nature of the change. Changes involving a relatively high risk and/or impact on the systems should be tested by the use of at least two independently developed tool sets and two independent centres. In that case, results should be made available to the designated committee with a view to verifying the technical specifications.

ON a side note I noticed that template 4.10 is still tagged "experimental" in the csv file. Anyone knows why?

When GRIB2 was introduced, not all templates were validated because there were no users, and at a later stage, these templates were flagged as “experimental” (there is a note under each template saying “Preliminary note: This template was not validated at the time of publication and should be used with caution. Please report any use to the WMO Secretariat (Observing and Information Systems Department) to assist for validation.”) and this is indicated in the status column of the computer-readable table. template 4.10 is one of these templates.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

sebvi commented 3 years ago

Trying to upload the samples using email attachment

On Thu, Jan 21, 2021 at 7:12 PM sébastien villaume < sebastien.villaume@gmail.com> wrote:

Dear Jitsuko,

I will upload a sample tomorrow. I apologize for the delay, I have been quite busy.

Sebastien

On 18 Jan 2021, at 05:27, Jitsuko Hasegawa notifications@github.com wrote:



@sebvi https://github.com/sebvi could you provide sample data using this template and a decode output to complete the validation process? I understand that the use of products encoded using this new template will be limited to users of ecCodes at this moment and therefore there is no reason not to approve it because there is no other decoders available. Please comment if there is objection.

C.f. 7.3 Testing with relevant applications For changes that have an impact on automated processing systems, the extent of the testing required before validation should be decided by the designated committee on a case-by-case basis, depending on the nature of the change. Changes involving a relatively high risk and/or impact on the systems should be tested by the use of at least two independently developed tool sets and two independent centres. In that case, results should be made available to the designated committee with a view to verifying the technical specifications.

ON a side note I noticed that template 4.10 is still tagged "experimental" in the csv file. Anyone knows why?

When GRIB2 was introduced, not all templates were validated because there were no users, and at a later stage, these templates were flagged as “experimental” (there is a note under each template saying “Preliminary note: This template was not validated at the time of publication and should be used with caution. Please report any use to the WMO Secretariat (Observing and Information Systems Department) to assist for validation.”) and this is indicated in the status column of the computer-readable table. template 4.10 is one of these templates.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wmo-im/GRIB2/issues/53#issuecomment-761989514, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWQHMZRBMFAKTY54DNU6DLS2PBERANCNFSM4SX24ZDA .

sebvi commented 3 years ago

seems to work now

I have uploaded a sample file for each new template (zip). The bitmap and data section have been removed otherwise the files were too big PDT86_ecmwf.zip PDT87_ecmwf.zip

here is the dump in txt files using ecCodes PDT86_ecmwf.txt PDT87_ecmwf.txt

amilan17 commented 3 years ago

@sebvi - The CSV needs some improvement and also, given the discussion today, should we keep the status as Experimental? It looks like you put this here, because the product is experimental.... 

https://github.com/wmo-im/GRIB2/blob/c5193b1ae99acaf474575f113571aff68b8155e9/GRIB2_Template_4_87_ProductDefinitionTemplate_en.csv#L17

sebvi commented 3 years ago

@amilan17 you are correct, I forgot the few empty columns at the end I will update that.

Status: My understanding is that, using the current workflow (that we may drop soon), the status should change to operational once validated.

amilan17 commented 3 years ago

@sebvi  thanks for the fix. That's not a practice that I want to support, so I'm going to change it to Operational now so I don't have to edit later and to minimize confusion when cleaning up the status columns.