Add climatology as 'reference'

judithberner commented 3 years ago

Transferred from https://github.com/judithberner/climpred_CESM1_S2S/issues/8

Please add a 'deterministic' (no ensemble member dimension) climatology (would also work for persistence of ensemble mean) for RPS, so that we can compute RPSS (SS: skill score with regard to reference) via RPS_experiment RPS_reference RPSS by formula: RPSS= 1- RPS_experiment/RPS_reference

I guess there are two climatologies - one predicted by the model and one by verif. There are also different ways to compute climatologies (references to follow). This is one reason to pull the computation of climatology out of any climpred routines.

The easiest way is to take e.g. nn years of verification data (S2S) and compute the mean climatology for day x, {x=1:365}, by averaging over all nn years. There are more sophisticated ways by using averaging windows, but in essence climatology is deterministic.

See an example how to compute the climatology of the model, here: https://github.com/judithberner/climpred_CESM1_S2S/blob/main/0.02_generate_climatology_S2S.ipynb Comment: These are used to removed the leadtime-dependent bias (forecast - climatology (of the model at leadtime t).

judithberner commented 3 years ago

@aaronspring

judithberner commented 3 years ago

Also - we don't really care about persistence, but the problem is computationally the same, assuming persistence is computed from the ensemble mean, not the members).

aaronspring commented 3 years ago

I think I understood climatology for probabilistic now. Just make a one-member fake-probabilistic and actually deterministic forecast and score. Here with xskillscore.rps. I will work on a climpred PR later this week.

climatology_forecast = climatology_week.sel(weekofyear=obsds.time.dt.weekofyear).expand_dims('member')
xs_rps(obsds, climatology_forecast, category_edges=category_edges, dim='time')

aaronspring commented 3 years ago

Also - we don't really care about persistence, but the problem is computationally the same, assuming persistence is computed from the ensemble mean, not the members).

Ok. the priority is on climatology. But for my understanding: How would a persistence forecast for a daily or monthly forecast look like? I thought persistence takes the value from initialization and keep it for all leads.

Is it basically a climatology forecast with the initial anomaly to climatology prescribed, e.g. a climatology with offset or would a subannual persistence forecast just ignore climatology? In my mind, it wouldnt make sense to do a persistence forecast for monthly leads in the extratropics where seasonality is strong.

judithberner commented 3 years ago

Wrt persistence: I agree - I would expect a persistence based on daily data to loose skill after 5-10 days (the forecast horizon).
Riley computes a heatmap comparing persistence against initialized forecasts and they are seem similar unskillful :-) : https://github.com/judithberner/climpred_CESM1_S2S/blob/main/0.05_climpred_verification.ipynb However, the initialized runs are more skillfull than persistence in winter, but not summer.

judithberner commented 3 years ago

As for climatology - I think in the end we only need the climatological percentiles to compute the RPS_clim: RPS_clim = sum_cat (p_climatology - p_obs)^2 e.g. for terciles p_climatology will be always [0.,0.33,0.66].

aaronspring commented 3 years ago

As for climatology - I think in the end we only need the climatological percentiles to compute the RPS_clim: RPS_clim = sum_cat (p_climatology - p_obs)^2 e.g. for terciles p_climatology will be always [0.,0.33,0.66].

sounds reasonable. I try to generalize for all probabilistic metrics also.

pangeo-data / climpred

Add climatology as 'reference' #565