Closed aaronspring closed 3 years ago
a good check by hand:
# time from forecast
(hindcast2_smean.get_initialized().sel(lead=5).rename({'init':'time'}).time+pd.Timedelta(weeks=5)).sel(time='2000').to_index()
# time from verification
hindcast2_smean.get_observations().time.sel(time='2000').to_index()
# both should have overlap
will be easier with https://github.com/pangeo-data/climpred/issues/575
I think this is the issue. Last night I had created weekly averages of the verification to match the init data/lead weeks, but climpred was getting an error where it was reducing down to an empty array. I will try this once the computers are back up.
xarray can guess the frequency of an index. xr.infer_freq(initialized.init)
can be something like 7D
. however, when we create initialized.init as ProlepticGregorian from NoLeap xr.infer_freq
yields None. climpred doesnt rely on infer_freq
, so this is not a requirement, but maybe we find a better solution than creating ProlepticGregorian from NoLeap. some other xarray time operations use infer_freq I think
I think we are coming here back to the problem of calendars #9.
I see 2 options:
I am not sure I can follow all. Here some statements:
we want to verify weekly averages not weekly average daily forecasts.
what I mean: 1. aggregate daily to weekly forecast: 7-day mean of 7 daily forecasts, 2. verify against weekly verification
I do not mean: 1. verify daily forecast against daily verification, 2. aggregate daily to weekly skill
Hi! I'm back on the computers and I need to have a better understanding of what needs to be done. I'm on a time crunch.
Can we still make the zarrs to have weekly leads or do we need to run HindcastEnsemble.smooth(lead=7) over the daily lead data? Weekly leads makes everything run so much faster. My book all_scores_newweek.ipynb reads in weekly lead data and then makes weekly averages of verification data. That's where I end up with this error:
ValueError: zero-size array to reduction operation minimum which has no identity
Let me know if none of this makes sense.
HindcastEnsemble.smooth
requires lead.attrs['units'] to be of the same timedelta as xr.infer_freq(verification.time). smooth
isnt very transparent to the outside. I suggest you produce an example first HindcastEnsemble(weekly_averaged_initialized_with_lead_attrs_weeks).add_observations(weekly_averaged_verification)
(this should work) and then as a cross-check see whether smooth
yields a similar skill.
smooth(lead=7)
will create lead 1-7, 2-8, 3-9, ...
probably for climpred it would be great to implement the resample function, which does resample on verification time and aggregates leads accordingly and changes lead.attrs['units'] at the same time.
Using climpred.smoothing.temporal_smoothing has a significant impact on our skill scores we are trying to understand this function better, but have some questions even after reading https://climpred.readthedocs.io/en/v2.1.1/smoothing.html There seem to be two interpretations:
smooth the timeseries using e.g. a running mean (or more sophisticated smoother), then verify => less noise, more skill (that would be my first interpretation of " recommended to take multi-year averages [Goddard2013]"). This would be consistent with "Apply temporal smoothing by creating rolling smooth-timestep means "(https://climpred.readthedocs.io/en/stable/_modules/climpred/smoothing.html:)
verify the multi-year/multi-week anomalies by computing weekly/annual anomalies and then verifying the temporally averaged anomalies. Maybe this is what you call " time-aggregated skill timeseries " and the labels lead = (1-3) in your example are consistent with that. This leads also to less noise and more skill.
The two will lead to different results. Thanks
Can we use resample for biweekly averages of the verifying data?
the bug conceptual difference:
climpred.smoothing.temporal_smoothing
creates timesteps 1-3, 2-4, etc3M
creates timestep 1-3, 4-6, etc. (I dont know whether HindcastEnsemble.resample
will work. better do it before HindcastEnsemble(hindcast.resample()))
I do not yet understand the different between your two outlined interpretations.
Can we use resample for biweekly averages of the verifying data?
Something like this should work:
hindcast=hindcast.assign_coords(lead=xr.cftime_range(start='2000',freq='1D',periods=hindcast.lead.size)).resample(lead='7D').assign_coords(lead=range(size))
# or some other easier aggregation function from daily to weekly
hindcast=hindcast.sel(lead=slice(0,7).mean('lead').assign_coords(lead=[1])
hindcast['lead'].attrs['units']='weeks'
obs=obs.resample(time='7D')
HindcastEnsemble(hindcast).add_observations(obs).verify()```
maybe this is related to your weekly zarr issue @judithberner @abjaye
When initializing
HindcastEnsemble
, hindcast.init should have the same frequency as verification.time. if I understand your examples correctly, you take weekly leads and but have daily verification. if hindcast.lead.units is weeks, thenclimpred
cannot find the matching week in verification if verification is of frequency days.if you start
HindcastEnsemble
from daily forecasts and daily verification,HindcastEnsemble.smooth(lead=7)
can convert lead days to lead weeks and at the same time do weekly averaging in verification (comes into effect after verification). This behaviour isnt robustly tested yet for seasonal time-scales. works for annual.Quick way to see forecasts and verification:
HindcastEnsemble.plot()
Here is doesnt match: daily verification but weekly forecasts