judithberner / climpred_CESM1_S2S

MIT License
3 stars 1 forks source link

weekly averages in both observations and forecasts #11

Closed aaronspring closed 3 years ago

aaronspring commented 3 years ago

maybe this is related to your weekly zarr issue @judithberner @abjaye

When initializing HindcastEnsemble, hindcast.init should have the same frequency as verification.time. if I understand your examples correctly, you take weekly leads and but have daily verification. if hindcast.lead.units is weeks, then climpred cannot find the matching week in verification if verification is of frequency days.

if you start HindcastEnsemble from daily forecasts and daily verification, HindcastEnsemble.smooth(lead=7) can convert lead days to lead weeks and at the same time do weekly averaging in verification (comes into effect after verification). This behaviour isnt robustly tested yet for seasonal time-scales. works for annual.

Quick way to see forecasts and verification: HindcastEnsemble.plot() image Here is doesnt match: daily verification but weekly forecasts

aaronspring commented 3 years ago

a good check by hand:


# time from forecast
(hindcast2_smean.get_initialized().sel(lead=5).rename({'init':'time'}).time+pd.Timedelta(weeks=5)).sel(time='2000').to_index()
# time from verification
hindcast2_smean.get_observations().time.sel(time='2000').to_index()

# both should have overlap
aaronspring commented 3 years ago

will be easier with https://github.com/pangeo-data/climpred/issues/575

abjaye commented 3 years ago

I think this is the issue. Last night I had created weekly averages of the verification to match the init data/lead weeks, but climpred was getting an error where it was reducing down to an empty array. I will try this once the computers are back up.

aaronspring commented 3 years ago

xarray can guess the frequency of an index. xr.infer_freq(initialized.init) can be something like 7D. however, when we create initialized.init as ProlepticGregorian from NoLeap xr.infer_freq yields None. climpred doesnt rely on infer_freq, so this is not a requirement, but maybe we find a better solution than creating ProlepticGregorian from NoLeap. some other xarray time operations use infer_freq I think

aaronspring commented 3 years ago

I think we are coming here back to the problem of calendars #9.

I see 2 options:

judithberner commented 3 years ago

I am not sure I can follow all. Here some statements:

aaronspring commented 3 years ago

we want to verify weekly averages not weekly average daily forecasts.

what I mean: 1. aggregate daily to weekly forecast: 7-day mean of 7 daily forecasts, 2. verify against weekly verification

I do not mean: 1. verify daily forecast against daily verification, 2. aggregate daily to weekly skill

abjaye commented 3 years ago

Hi! I'm back on the computers and I need to have a better understanding of what needs to be done. I'm on a time crunch.

  1. We need to fix the calendars so they are consistent.
  2. Can we still make the zarrs to have weekly leads or do we need to run HindcastEnsemble.smooth(lead=7) over the daily lead data? Weekly leads makes everything run so much faster. My book all_scores_newweek.ipynb reads in weekly lead data and then makes weekly averages of verification data. That's where I end up with this error:

    ValueError: zero-size array to reduction operation minimum which has no identity

Let me know if none of this makes sense.

aaronspring commented 3 years ago

HindcastEnsemble.smooth requires lead.attrs['units'] to be of the same timedelta as xr.infer_freq(verification.time). smooth isnt very transparent to the outside. I suggest you produce an example first HindcastEnsemble(weekly_averaged_initialized_with_lead_attrs_weeks).add_observations(weekly_averaged_verification) (this should work) and then as a cross-check see whether smooth yields a similar skill.

smooth(lead=7) will create lead 1-7, 2-8, 3-9, ...

aaronspring commented 3 years ago

probably for climpred it would be great to implement the resample function, which does resample on verification time and aggregates leads accordingly and changes lead.attrs['units'] at the same time.

judithberner commented 3 years ago

Using climpred.smoothing.temporal_smoothing has a significant impact on our skill scores we are trying to understand this function better, but have some questions even after reading https://climpred.readthedocs.io/en/v2.1.1/smoothing.html There seem to be two interpretations:

The two will lead to different results. Thanks

judithberner commented 3 years ago

Can we use resample for biweekly averages of the verifying data?

aaronspring commented 3 years ago

the bug conceptual difference:

I do not yet understand the different between your two outlined interpretations.

aaronspring commented 3 years ago

Can we use resample for biweekly averages of the verifying data?

Something like this should work:


hindcast=hindcast.assign_coords(lead=xr.cftime_range(start='2000',freq='1D',periods=hindcast.lead.size)).resample(lead='7D').assign_coords(lead=range(size))
# or some other easier aggregation function from daily to weekly
hindcast=hindcast.sel(lead=slice(0,7).mean('lead').assign_coords(lead=[1])
hindcast['lead'].attrs['units']='weeks'

obs=obs.resample(time='7D')

HindcastEnsemble(hindcast).add_observations(obs).verify()```
aaronspring commented 3 years ago

https://gist.github.com/aaronspring/4e89c224ecceb2bb7a1c3bc0c816d538