team-sx / iwxxm-sx

0 stars 1 forks source link

Describing predicted probabililistic values in aerodrome forecasts #20

Open ilkkarinne opened 3 months ago

ilkkarinne commented 3 months ago

Some discussion on this topic is written down in the meeting |notes for the 2024-Feb-28 meeting](https://github.com/team-sx/iwxxm-sx/wiki/Aerodrome-MET-Observation-and-Forecast-Discussion-2024-Feb-28).

Aviation weather forecasters try to estimate the probabilities of occurrence of a particular weather phenomena affecting aviation operations at particular places and times. Sometimes it is much easier to provide estimates of particular phenomena happening than the exact time and location where it will happen: the speed and/or direction of movement of a weather front or storm may change unexpectedly, or different probabilistic forecast models may predict slightly different scenarios for their development and movement.

We need to decide which aspects of the weather phenomena occurrence we want to nail down in the aerodrome forecast model in order to come up with the right kind of elements for expressing probability and forecast uncertainty. As we are describing weather in the vicinity of an aerodrome, it would feel natural to fix the location at least: this would mean that the we would not foresee nee for expressing uncertainty of the location of the occurrence of the predicted phenomena. Would leave open three aspects:

  1. the existence of particular aviation affecting weather phenomena at the chosen location,
  2. the time of the occurrence of these phenomena at the chosen location, and
  3. the describing properties of these phenomena at the chosen location.

Providing uncertainties for any of these would potentially seem useful in an aviation weather forecast: The pilots and air traffic controllers may be interested to know

  1. what is the probability of strong turbulence or wind shear happening at all at the airport at a particular time,
  2. what is the probability of a known weather front with thunderstorm with heavy precipitation of hail reaching the airport between 13:00 and 13:30 on a particular day, or
  3. what is the probability of wind gusts of more than 15 m/s occurring at the airport at a given time.

The probability also seem to be closely related to the variance in the properties of particular predicted phenomena: The forecasters may for example be fairly certain that the cloudiness at the airport may vary from 50% sky coverage to 80% sky coverage for short periods of time during the forecast period (expressed with TEMPO groups in TAF). To me this does have more information that saying that the probability of cloud cover of 80% for the entire forecast time period is 20% or the probability of cloud cover of 50% for the entire forecast time period is 90%.

ilkkarinne commented 2 months ago

Initial versions of probabilistic aerodrome forecast values have now been drafted:

These are made possible by slight modifications of the TSML 1.3 schemas (addition of RangeTVP element) and new elements iwxxm:QuantityProbabilityDistributionPercentile, iwxxm:QuantityValueBetweenProbability and QuantityValueExceedenceProbability extending SWE Common Quantity and QuantityRange types/elements

blchoy commented 1 month ago

Thank you. I believe we are thinking along the same line but let me rephrase to confirm.

The WxObject model describes a phenomenon with:

  1. a spatial coverage (point, line, area, volume)
  2. a temporal coverage (point, interval)
  3. properties which can have one or more attributes

So far, we have been using a WxObject to describe phenomena with uniform properties within the space-time coverage. For example, in a QVA WxObject, it has a spatial volume as well as a time point, and then a property attribute describing the concentration within the space-time coverage. A series of WxObjects with different spatial volumes and time points are given to indicate the space-time variation of the phenomenon. (We discussed whether we need an attribute to connect WxObjects of different time points to show spatial-temporal changes of an object, but WG-MOG say no).

To incorporate a time series, we now extend the temporal coverage to an interval and describe the temporal variation in the property section of the WxObject. The use of tsml:interpolationType ensure that even though we just provide a value at a particular time point, the space-time coverage could still be regarded as a continuum since the value(s) effectively represent the properties from the last time point to the current time point (this also means that tsml:interpolationType has to be a mandatory element if temporal continuum with the time interval is a necessary quality). Personally I think this is essential since "empty space" should be represented as a hole in the space-time coverage.

On the other hand, we are not too sure if we want to use tsml:aggregationDuration to indicate how the data is being aggregated. In fact, in the non-time series case (i.e. temporal coverage is a point), we may still need to indicate this description so for consistence we may want to use the same description inside and outside of TSML. Can we allow external descriptions on aggregation within TSML itself, like OPM?

The use of TSML within the temporal coverage of a WxObject also inspires the possibility of describing the distribution of properties within the spatial coverage of a WxObject, and may be even a combination of them. I am open to this kind of representation, but would like to keep a WxObject as simple as possible to facilitate end of chain processing.

blchoy commented 1 month ago

Regarding the examples, there are a number of data points but just one set of percentiles and thresholds. Is it possible to have percentiles or thresholds for individual data point?

blchoy commented 1 month ago

Regarding uncertainty I concur we need to fix the spatial-temporal coverage before we can provide such descriptions for the properties of the phenomenon. However, we should be careful when the spatial-temporal coverage is not a point, whether the uncertainties describe possible values of properties in a homogeneous environment, or distribution of values with an in-homogeneous environment.

For example, wind forecast in a TAF is supposed to represent the whole aerodrome. If we provide uncertainty information on this phenomenon that should not be regarded as spatial uncertainties across the aerodrome. Having said that, in future aerodrome forecasts, uncertainties covering spatial variation may be useful, especially for those phenomena which can move like thunderstorms and wind shear. In fact, similar information is already being provided in area forecasts:

ilkkarinne commented 1 month ago

To incorporate a time series, we now extend the temporal coverage to an interval and describe the temporal variation in the property section of the WxObject. The use of tsml:interpolationType ensure that even though we just provide a value at a particular time point, the space-time coverage could still be regarded as a continuum since the value(s) effectively represent the properties from the last time point to the current time point (this also means that tsml:interpolationType has to be a mandatory element if temporal continuum with the time interval is a necessary quality). Personally I think this is essential since "empty space" should be represented as a hole in the space-time coverage.

It TSML schema the interpolationType is defined as

Defines the nature of the relationship between the time instant and the recorded value. For example, the value may represent an average across the time period since the last point (average in preceding interval). This value should be taken from the InterpolationCode list. The interpolation type is defined per point within the time series as it is possible for this to change mid series. Within the XML encoding it is possible to set a default interpolation for the series.

And indeed, this attribute is part of the point metadata, so it can be provided separately for each point in the time series. As the variation of the interpolation mid-series is not very common, default values may be provided for any of the TSML point metadata attributes.

On the other hand, we are not too sure if we want to use tsml:aggregationDuration to indicate how the data is being aggregated. In fact, in the non-time series case (i.e. temporal coverage is a point), we may still need to indicate this description so for consistence we may want to use the same description inside and outside of TSML. Can we allow external descriptions on aggregation within TSML itself?

To me the interpolation (avg/min/max function) and the aggregation period (time over which the statistical function has been applied) are tightly coupled. Both of them can of course be provided as integral parts the observed property (quantity) definition, and thus could in principle also be provided outside the timeseries.

ilkkarinne commented 1 month ago

Regarding the examples, there are a number of data points but just one set of percentiles and thresholds. Is it possible to have percentiles or thresholds for individual data point?

Not entirely sure if I understood your point completely. In the precentile example the .95th precentile min/max ranges for all the three wind properties are provided for each timeseries point. I've also just now added a new example with both .90nth and .95nth precentile ranges for the same properties, see AerodromeWeatherForecast-probability-percentiles-90-95-Example.xml

blchoy commented 1 month ago

Not entirely sure if I understood your point completely. In the precentile example the .95th precentile min/max ranges for all the three wind properties are provided for each timeseries point. I've also just now added a new example with both .90nth and .95nth precentile ranges for the same properties, see AerodromeWeatherForecast-probability-percentiles-90-95-Example.xml

I see it now. So there is positional correspondence across different tags, viz:

<tsml:defaultPointMetadata>
    <tsml:FieldSpecificPointMetadata/>    <-- This is first series
    <tsml:FieldSpecificPointMetadata/>    <-- This is second series
    ...
</tsml:defaultPointMetadata>
<tsml:point/>    <-- This is first series
<tsml:point/>    <-- This is second series
...
<tsml:pointRecord>    <-- This is first series
<tsml:pointRecord>    <-- This is second series
...

Am I correct? There are also names (e.g. windGust-p.90) so do they need to be in the same order too?

ilkkarinne commented 1 month ago
<-- This is first series <-- This is second series ... <-- This is first series <-- This is second series ... <-- This is first series <-- This is second series

Yes, there was tsml:pointRecord element defining the structure for the values in each tsml:point, and only one pointRecord per time series, defining the fields of within each tsml:point in the order presented. However, your comment triggered me to improve this a bit to combine defining the time series point value record structure with the other point metadata (such as the interpolationType) within the tsml:defaultPointMetadata element. The tsml:qualifier element was already there, and can be used for providing the Quantity information. Thus we can remove the pointRecord structure entirely, as presented in the meeting on 8th May.

Example of the non-probalistic data for simplicity, see AerodromeWeatherForecast-record-Example.xml:

                  <tsml:defaultPointMetadata>
                        <tsml:FieldSpecificPointMetadata field="windSpeed" position="1">
                            <tsml:interpolationType xlink:href="http://www.opengis.net/def/timeseries/InterpolationCode/AveragePrec"/>
                            <tsml:aggregationDuration>PT10M</tsml:aggregationDuration>
                            <tsml:qualifier>
                                <swe:Quantity definition="http://preferred.observable-property.registry/windSpeed">
                                    <swe:label>Wind speed</swe:label>
                                    <swe:uom xlink:href="https://unitsofmeasure.org/ucum#m_s" code="m/s"/>
                                </swe:Quantity>
                            </tsml:qualifier>

                        </tsml:FieldSpecificPointMetadata>

                        <tsml:FieldSpecificPointMetadata field="windDirection" position="2">
                            <tsml:interpolationType xlink:href="http://www.opengis.net/def/timeseries/InterpolationCode/AveragePrec"/>
                            <tsml:aggregationDuration>PT10M</tsml:aggregationDuration>
                            <tsml:qualifier>
                                <swe:Quantity definition="http://preferred.observable-property.registry/windDirection">
                                    <swe:label>Wind direction</swe:label>
                                    <swe:uom xlink:href="https://unitsofmeasure.org/ucum#deg" code="deg"/>
                                </swe:Quantity>
                            </tsml:qualifier>
                        </tsml:FieldSpecificPointMetadata>

                        <tsml:FieldSpecificPointMetadata field="windGust" position="3">
                            <tsml:interpolationType xlink:href="http://www.opengis.net/def/timeseries/InterpolationCode/MaxPrec"/>
                            <tsml:aggregationDuration>PT10M</tsml:aggregationDuration>
                            <tsml:qualifier>
                                <swe:Quantity definition="http://preferred.observable-property.registry/windGust">
                                    <swe:label>Wind gust</swe:label>
                                    <swe:uom xlink:href="https://unitsofmeasure.org/ucum#deg" code="m/s"/>
                                </swe:Quantity>
                            </tsml:qualifier>
                        </tsml:FieldSpecificPointMetadata>

                    </tsml:defaultPointMetadata>

                    <tsml:point>
                        <tsml:RecordTVP>
                            <tsml:time>2001-01-31T00:00:00Z</tsml:time>
                            <tsml:value>1.0 110 1.5</tsml:value>
                        </tsml:RecordTVP>
                    </tsml:point>
                    <tsml:point>
                        <tsml:RecordTVP>
                            <tsml:time>2001-01-31T00:10:00Z</tsml:time>
                            <tsml:value>2.2 105 3.5</tsml:value>
                        </tsml:RecordTVP>
                    </tsml:point>

The same pattern can be used to the percentile and threshold-crossing type probabilistic forecasts too:

ilkkarinne commented 1 month ago

The optional position attribute explicates the the position of the given field in the record structure. Thus we do not need to enforce the order the tsml:FieldSpecificPointMetadata to match the order of the values in the tsml:RecordTVP/tsml:value elements if the position attribute is given.