pangeo-gallery / cmip6

Examples of CMIP6 cloud data analysis with Pangeo
MIT License
21 stars 10 forks source link

Use default-binder repository #1

Closed TomAugspurger closed 4 years ago

TomAugspurger commented 4 years ago

AFAICT, the only differences are moving the base Dockerfile and pangeo-notebook images forward to 2020.08.31.

pangeo-notebook=2020.04.28 -> pangeo-notebook=2020.08.31

Using the default-binder image will make testing deployments of the binderhub a bit easier.

TomAugspurger commented 4 years ago

cc @rabernat

rabernat commented 4 years ago

Thanks so much for this. LGTM provided the build succeeds.

Would be awesome if a bot could make these PRs automatically. 😆

TomAugspurger commented 4 years ago

Possible data issue

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-7df12fa49c69> in <module>
      6     return xr.DataArray(ecs)
      7 
----> 8 ds_abrupt['ecs'] = ds_abrupt.groupby('source_id').apply(calc_ecs)
      9 ds_abrupt
...
      2 
      3 def calc_ecs(ds):
----> 4     a, b = np.polyfit(ds.tas, ds.imbalance, 1)
      5     ecs = -0.5 * (b/a)
      6     return xr.DataArray(ecs)

<__array_function__ internals> in polyfit(*args, **kwargs)

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/lib/polynomial.py in polyfit(x, y, deg, rcond, full, w, cov)
    627     scale = NX.sqrt((lhs*lhs).sum(axis=0))
    628     lhs /= scale
--> 629     c, resids, rank, s = lstsq(lhs, rhs, rcond)
    630     c = (c.T/scale).T  # broadcast scale coefficients
    631 

<__array_function__ internals> in lstsq(*args, **kwargs)

/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/linalg/linalg.py in lstsq(a, b, rcond)
   2304         # lapack can't handle n_rhs = 0 - so allocate the array one larger in that axis
   2305         b = zeros(b.shape[:-2] + (m, n_rhs + 1), dtype=b.dtype)
-> 2306     x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
   2307     if m == 0:
   2308         x[...] = 0

ValueError: On entry to DLASCL parameter number 4 had an illegal value

source_id="NorCPM1" is the (first) one with an issue

>>> np.isnan(ds_abrupt.sel(source_id="NorCPM1").tas).data.sum()
70

It looks like NorCPM1 just doesn't have observations for those years maybe.

>>> dsets_aligned_["NorCPM1"].year
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
       14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27.,
       28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41.,
       42., 43., 44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54., 55.,
       56., 57., 58., 59., 60., 61., 62., 63., 64., 65., 66., 67., 68., 69.,
       70., 71., 72., 73., 74., 75., 76., 77., 78., 79.])

So we have a few options:

  1. Exclude NormCPM1.
  2. Select just the first 79 years, instead of the first 150.
  3. Drop missing values in the computations.

I'll probably just drop missing values...

rabernat commented 4 years ago

Now we are doing science via CI! 🚀

cc @hdrake, who was the original author of this notebook.

TomAugspurger commented 4 years ago

FYI, this notebook is failing with a different error with the production binder, I think since it was using its own image rather than default-binder (dask cluster not starting).

https://staging.binder.pangeo.io/v2/gh/pangeo-gallery/default-binder/staging/?urlpath=git-pull?repo=https://github.com/pangeo-gallery/cmip6%26amp%3Burlpath=lab/tree/cmip6/ECS_Gregory_method.ipynb%3Fautodecode gets us to the point where the missing values in NorCPM1 cause issues.

rabernat commented 4 years ago

I just find it so awesome that you are able to debug both the deep infrastructure and the actual science code. 🥇

Thinking about how to scale this and imagining that you would be a bot instead of a person.

It would be nice if a bot made a PR to update the binderbot config, just as you have done. (This is kind of like how conda forge handles new releases.) If the CI succeeds, then the owner just has to click merge. But if it fails, it would be nice for the bot to post the binder link, like you did, allowing the gallery owner to debug the failing notebook.

The icing on the cake would be to have some way to pass git credentials around so that the user could actually push fixed notebook from binder directly back to the PR branch! 🚀