nci / scores

Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.
https://scores.readthedocs.io/
Apache License 2.0
50 stars 15 forks source link

Fine grained controls on FSS for advanced users #353

Open nikeethr opened 4 months ago

nikeethr commented 4 months ago

Follow-up from: https://github.com/nci/scores/issues/188#issuecomment-2097490317

The FSS implementation in #266 provides the user with flexibility to calculate skill scores from single 2-D fields. However, accumulating these scores temporally or spatially across other dimensions is not a trivial problem, and may have implications on skill interpretation. See: 1

The current implementation (#266) aggregates the decomposed scores. Which is one approach shown in the reference; and also the default/implied approach in Roberts and Lean (2008):

Although not explicitly stated in Roberts and Lean (2008), Eq. (3) was always the intended way of using the score for multiple forecasts

While this achieves consistency in some situations may have trade-offs compared to simply averaging the scores in other situations. Furthermore, 0 value cases were discussed here: https://github.com/nci/scores/pull/266#discussion_r1595199353 similar questions/points are also considered in the reference.

The implementation in #266 already offers a single field variant, and in specialised methods the ability to return decomposed scores. Hence, the user can already choose to address any follow up research using those features. However, for convenience it may be useful to provide the choice between aggregation (current implementation) and averaging over other dimensions when accumulating fields.

Further, to address cases where F = O = 0 there are a couple of propositions given in the referenced paper.

Proposed fine grained controls:

  1. option to choose averaging over aggregation.
  2. option to choose edge case methods when computing the FSS score for certain cases e.g. forecast & obs are effectively 0.
  3. FSS curve (see replies)
  4. % based thresholding (see replies)

Reference

  1. https://journals.ametsoc.org/view/journals/mwre/149/10/MWR-D-18-0106.1.xml
nikeethr commented 4 months ago

FSS "curve"

Often times in papers the FSS is computed as a curve, where

x axis = scale (i.e. window size)
y axis = FSS value

Further, curves of exponentially increasing thresholds, e.g. 0.5 1 2 4 8 16 32 are commonly superimposed on the plot (at least in the case of rainfall). Alternatively, one may wish to overlay lead_times. Currently creating complex datasets with multiple windows/threshold are shown in FSS.ipynb tutorial notebook. However, if this is a common enough use-case it should be considered to be in the public API.

Note: This is where #269 and #270 can provide benefits due to speed up of computations as well as limiting memory usage, when compared to pure python/numpy calls. This would allow more data-points to be used in the curve without much computational overhead.

nikeethr commented 3 months ago

Extra usability comments from Pete

An extra comment: It is common to use FSS on percentages as well as/instead of thresholds. Is it too late to ask for this to be added (or should it be a new request?)

Reason I ask is that FSS with percentages can better reflect user interpretations - especially where forecasters really look at patterns and generally do a bit of bias correction for peak amounts. This is especially true for rainfall, which is the main use of FSS.

Another common part of FSS is to impose a lower threshold on frequency - i.e. to get a result both obs & model domain-wide frequencies need to have greater than a certain percentage (0.5%, or 1% or some user specified value). When comparing occurrences at just a few points across a domain the FSS doesn't really make much sense.