Boot-strap multi-model median (e.g. assuming only 10 models)

hdrake / cmip6hack-multigen

Multi-generational inter-comparison of climate model performance metrics

MIT License

2 stars 4 forks source link

Boot-strap multi-model median (e.g. assuming only 10 models) #25

Open hdrake opened 4 years ago

hdrake commented 4 years ago

It would be nice to quantify how likely it is that the models are actually improving in skill rather than the improvement being by chance due to the increasing size of ensembles with time. It should be fairly straightforward to do some boot-strapping simulations.

bradyrx commented 4 years ago

I have a ton of bootstrap implementations between climpred and work with students in my lab so I can pretty quickly put this together.

On Wed, Mar 4, 2020, 7:45 AM Henri Drake notifications@github.com wrote:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hdrake/cmip6hack-multigen/issues/25?email_source=notifications&email_token=ACDYIES3F52CBHKPVOY5RB3RFZSRJA5CNFSM4LBJLIMKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ISM2X7A, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDYIEULFE6GC5J6IEKJTJDRFZSRJANCNFSM4LBJLIMA .

hdrake commented 4 years ago

I'm hoping to clean up the notebooks throughout the day and swap in ERA5 as our ground truth dataset (rather than NCEP), but once I do that I'll make a pull request and point out where I think boostrapping would be appropriate.

I'm presenting some of these results at an informal departmental seminar on Friday so going to work on this project full time until then.

bradyrx commented 4 years ago

Sounds good, I'll look out for it. I'm headed to the mountains tomorrow afternoon but may be working a little bit during that time.

bradyrx commented 4 years ago

In climpred we bootstrap with replacement over the member dimension in some cases. So you reconstruct the N member ensemble with N members with replacement and compute the statistic you're interested in over i iterations then can develop confidence intervals.

The other code I have for group members uses block bootstrapping to shuffle a given time series to approximate differences in internal variability. Both are pretty straightforward with xarray, just depends on what we need here. I imagine the former.

bradyrx commented 4 years ago

Which in case I am non-responsive in the mountains looks like this:

def _resample(xobj, resample_dim):
    """Resample with replacement in dimension ``resample_dim``.

    Args:
        xobj (xr.object): input xr.object to be resampled.
        resample_dim (str): dimension to resample along.

    Returns:
        xr.object: resampled along ``resample_dim``.

    """
    to_be_resampled = xobj[resample_dim].values
    smp = np.random.choice(to_be_resampled, len(to_be_resampled))
    smp_xobj = xobj.sel({resample_dim: smp})
    smp_xobj[resample_dim] = xobj[resample_dim].values
    return smp_xobj

bradyrx commented 4 years ago

You could then wrap that with something like:

def bootstrap(xobj, niteration, resample_dim='member'):
    return xr.concat([_resample(xobj, resample_dim) for _ in range(niteration)], 'bootstrap')

I'm working on a faster way to do this without the list comprehension + concatentation, but this should be fine for what you need.