Expand checks in compare_ts_and_hist notebooks

As of d604c92 in #29 I am no longer running da.identical() to compare data, but I am verifying that time series files for every variable in the CESM history files exist. This is done for all five streams: pop.h, pop.h.nday1, pop.h.nyear1, cice.h, and cice.h1.

I tried running

history_filenames = case.get_history_files(year, stream)
# open_mfdataset_kwargs: data_vars="minimal", compat="override", coords="minimal", parallel=True
ds_hist = xr.open_mfdataset(history_filenames, **open_mfdataset_kwargs)
# vars_to_check = [var for var in ds_hist.data_vars if "time" in ds_hist[var].coords and var != "time_bound"]
vars_to_check = ["TEMP"]
for var in vars_to_check:
    timeseries_filenames = case.get_timeseries_files(year, stream, var)
    ds_ts = xr.open_mfdataset(timeseries_filenames, **open_mfdataset_kwargs)
#   limiting comparison to single level works fine
#    da_hist = ds_hist[var].isel(z_t=0)
#    da_ts = ds_ts[var].isel(z_t=0)
#   comparing full 3D field blows memory, even with dask (cluster.scale(12))
    da_hist = ds_hist[var]
    da_ts = ds_ts[var]
    if da_hist.identical(da_ts):
        print(f"{var} is the same in history and time series")
    else:
        print(f"{var} is DIFFERENT in history and time series")

and, as the inline comments indicate, was blowing memory even with cluster.scale(12) while comparing a single level was fine in serial or parallel. In fact, I saw modest performance gains from running in parallel:

with isel(z_t=0)
----
Parallel, cluster.scale(n=8):
CPU times: user 4.28 s, sys: 92.3 ms, total: 4.38 s
Wall time: 16.4 s

Serial:
CPU times: user 19.7 s, sys: 3.17 s, total: 22.9 s
Wall time: 25.1 s

marbl-ecosys / HiRes-CESM-analysis

Expand checks in compare_ts_and_hist notebooks #33