Open hdrake opened 4 years ago
I have a ton of bootstrap implementations between climpred and work with students in my lab so I can pretty quickly put this together.
On Wed, Mar 4, 2020, 7:45 AM Henri Drake notifications@github.com wrote:
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hdrake/cmip6hack-multigen/issues/25?email_source=notifications&email_token=ACDYIES3F52CBHKPVOY5RB3RFZSRJA5CNFSM4LBJLIMKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ISM2X7A, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDYIEULFE6GC5J6IEKJTJDRFZSRJANCNFSM4LBJLIMA .
I'm hoping to clean up the notebooks throughout the day and swap in ERA5 as our ground truth dataset (rather than NCEP), but once I do that I'll make a pull request and point out where I think boostrapping would be appropriate.
I'm presenting some of these results at an informal departmental seminar on Friday so going to work on this project full time until then.
Sounds good, I'll look out for it. I'm headed to the mountains tomorrow afternoon but may be working a little bit during that time.
In climpred
we bootstrap with replacement over the member dimension in some cases. So you reconstruct the N member ensemble with N members with replacement and compute the statistic you're interested in over i iterations then can develop confidence intervals.
The other code I have for group members uses block bootstrapping to shuffle a given time series to approximate differences in internal variability. Both are pretty straightforward with xarray
, just depends on what we need here. I imagine the former.
Which in case I am non-responsive in the mountains looks like this:
def _resample(xobj, resample_dim):
"""Resample with replacement in dimension ``resample_dim``.
Args:
xobj (xr.object): input xr.object to be resampled.
resample_dim (str): dimension to resample along.
Returns:
xr.object: resampled along ``resample_dim``.
"""
to_be_resampled = xobj[resample_dim].values
smp = np.random.choice(to_be_resampled, len(to_be_resampled))
smp_xobj = xobj.sel({resample_dim: smp})
smp_xobj[resample_dim] = xobj[resample_dim].values
return smp_xobj
You could then wrap that with something like:
def bootstrap(xobj, niteration, resample_dim='member'):
return xr.concat([_resample(xobj, resample_dim) for _ in range(niteration)], 'bootstrap')
I'm working on a faster way to do this without the list comprehension + concatentation, but this should be fine for what you need.
It would be nice to quantify how likely it is that the models are actually improving in skill rather than the improvement being by chance due to the increasing size of ensembles with time. It should be fairly straightforward to do some boot-strapping simulations.