jbusecke / xMIP

Analysis ready CMIP6 data in python the easy way with pangeo tools.
https://cmip6-preprocessing.readthedocs.io/en/latest/?badge=latest
Apache License 2.0
198 stars 44 forks source link

Adding limited support for unstructured grids #116

Open koldunovn opened 3 years ago

koldunovn commented 3 years ago

I suggest to add some limited support for unstructured grids, since @jbusecke expressed his interest. From CMIP6 models this will consider AWI-CM (FESOM1.4) and MPAS-O (E3SM). All metrics that are based on multiplying scalar data by areacelo and depth of the vertical levels should be possible. The only big problem I see is that sometimes areacello should be 3D.

At the end it would be nice to see unstructured models in analysis like this https://cmip6-preprocessing.readthedocs.io/en/latest/parsing_metrics.html

Some very basic information on FESOM1.4 mesh is available here: https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/

@trackow , @helgegoessling and @hegish you now AWI-CM1 grids much better than I do, can you share the file with node weights for LR and MR meshes? Also do you see some potential problems?

@mark-petersen Maybe your group will be interested as well?

@jbusecke What do you think should be our first steps apart from providing the file with weights? :)

jbusecke commented 3 years ago

Are these files already part of the ESGF archive? We need some way to get some way to get them in the cloud. Technically I see no problem with them just working, if the metadata is set similar to the other models. The matching function is purely based on dataset attributes! So am I correct that instead of a 3/4 dim array (x,y,z,?t) the metric/tracer data would be in some form like (node,z,?time)?

jbusecke commented 3 years ago

The only big problem I see is that sometimes areacello should be 3D.

Not generally a problem in my view. We can deal with time variable thicknesses which are 4D just fine for the other models.

jbusecke commented 3 years ago

I just tried to access the data from the pangeo Google Cloud deployment, but the area values are missing. I can ask if these can be added in the future, but in the meantime, does anyone have some example data that I could experiment with?

trackow commented 3 years ago

We will compute all necessary (3D and 2D) weights and area values over the weekend with @dsidoren and will come back to you early next week! All experiment data (for LR and MR resolutions) is summarized here: https://marine-data.de/?site=data&q=AWI&qf=organisations.provider.abbreviation%2FDKRZ

In particular, there is CMIP6 Deck for the MR grid: https://cera-www.dkrz.de/WDCC/ui/cerasearch/cmip6?input=CMIP6.CMIP.AWI.AWI-CM-1-1-MR

and for the LR grid e.g.: https://cera-www.dkrz.de/WDCC/ui/cerasearch/cmip6?input=CMIP6.CMIP.AWI.AWI-ESM-1-1-LR

trackow commented 3 years ago

Hello @jbusecke , we have the weights ready - but I want to confirm these do the right thing before sending them to you.

jbusecke commented 3 years ago

Cool, let me know when you think I could dive in. Also no particular rush! I have a good bunch of things on my plate to keep me busy.

jbusecke commented 3 years ago

Quick update: #165 did not quite solve this here. I will try to finish that one of in another PR. Just as a reminder, this is what I did:

from cmip6_preprocessing.utils import google_cmip_col
col = google_cmip_col()

from cmip6_preprocessing.preprocessing import combined_preprocessing, rename_cmip6

cat = col.search(source_id='AWI-ESM-1-1-LR', variable_id=['thetao'], grid_label='gn')
ddict_awi = cat.to_dataset_dict(aggregate=False, preprocess=combined_preprocessing)

and I got an error in the parse_lon_lat_bounds:

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/coding/times.py:517: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
/srv/conda/envs/notebook/lib/python3.8/site-packages/numpy/core/_asarray.py:102: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
  return array(a, dtype, copy=False, order=order)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/merge_util.py in _open_asset(path, data_format, zarr_kwargs, cdf_kwargs, preprocess, varname, requested_variables)
    314         try:
--> 315             ds = preprocess(ds)
    316         except Exception as exc:

/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/preprocessing.py in combined_preprocessing(ds)
    450     # rename the `bounds` according to their style (bound or vertex)
--> 451     ds = parse_lon_lat_bounds(ds)
    452     # sort verticies in a consistent manner

/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/preprocessing.py in parse_lon_lat_bounds(ds)
    282         if "x" not in ds.lat_bounds.dims:
--> 283             ds.coords["lat_bounds"] = ds.coords["lat_bounds"] * xr.ones_like(ds.x)
    284 

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/common.py in __getattr__(self, name)
    238                     return source[name]
--> 239         raise AttributeError(
    240             "{!r} object has no attribute {!r}".format(type(self).__name__, name)

AttributeError: 'Dataset' object has no attribute 'x'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-14-4486757f4be9> in <module>
      2 
      3 cat = col.search(source_id='AWI-ESM-1-1-LR', variable_id=['thetao'], grid_label='gn')
----> 4 ddict_awi = cat.to_dataset_dict(aggregate=False, preprocess=combined_preprocessing)

/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/core.py in to_dataset_dict(self, zarr_kwargs, cdf_kwargs, preprocess, storage_options, progressbar, aggregate)
    928             ]
    929             for i, task in enumerate(concurrent.futures.as_completed(future_tasks)):
--> 930                 key, ds = task.result()
    931                 self._datasets[key] = ds
    932                 if self.progressbar:

/srv/conda/envs/notebook/lib/python3.8/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433 
    434             self._condition.wait(timeout)

/srv/conda/envs/notebook/lib/python3.8/concurrent/futures/_base.py in __get_result(self)
    386     def __get_result(self):
    387         if self._exception:
--> 388             raise self._exception
    389         else:
    390             return self._result

/srv/conda/envs/notebook/lib/python3.8/concurrent/futures/thread.py in run(self)
     55 
     56         try:
---> 57             result = self.fn(*self.args, **self.kwargs)
     58         except BaseException as exc:
     59             self.future.set_exception(exc)

/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/core.py in _load_source(key, source)
    914 
    915         def _load_source(key, source):
--> 916             return key, source.to_dask()
    917 
    918         sources = {key: source(**source_kwargs) for key, source in self.items()}

/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/source.py in to_dask(self)
     87     def to_dask(self):
     88         """Return xarray object (which will have chunks)"""
---> 89         self._load_metadata()
     90         return self._ds
     91 

/srv/conda/envs/notebook/lib/python3.8/site-packages/intake/source/base.py in _load_metadata(self)
    234         """load metadata only if needed"""
    235         if self._schema is None:
--> 236             self._schema = self._get_schema()
    237             self.dtype = self._schema.dtype
    238             self.shape = self._schema.shape

/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/source.py in _get_schema(self)
     55 
     56         if self._ds is None:
---> 57             self._open_dataset()
     58 
     59             metadata = {

/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/source.py in _open_dataset(self)
     73     def _open_dataset(self):
     74         mapper = _path_to_mapper(self.row[self.path_column], self.storage_options, self.data_format)
---> 75         ds = _open_asset(
     76             mapper,
     77             data_format=self.data_format,

/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/merge_util.py in _open_asset(path, data_format, zarr_kwargs, cdf_kwargs, preprocess, varname, requested_variables)
    315             ds = preprocess(ds)
    316         except Exception as exc:
--> 317             raise RuntimeError(
    318                 f'Failed to apply pre-processing function: {preprocess.__name__}'
    319             ) from exc

RuntimeError: Failed to apply pre-processing function: combined_preprocessing

Solving this should be as simple as testing with an appropriate array and putting some extra if statements in the code.

trackow commented 3 years ago

Hello @jbusecke , thank you for your continued efforts to include AWI-CM and AWI-ESM as well, which we appreciate very much! For the configurations with the LR, MR, and HR ocean grids, we now uploaded the cluster areas and volumes to the DKRZ cloud.

"LR", "MR", and "HR" stand for the respective ocean grids; "vol2D" is the cluster area for the 2D nodes at the surface (e.g. relevant for SST, MLD, SSS etc). "vol3D" is for computing 3D-means for example (thetao etc). For means on a respective level in the 3D ocean, say at 1000m or so, we cannot use "vol2D" because the cluster areas will look different at depth due to bathymetry. This is what the vol23D files are for: they define the cluster areas for the respective ocean levels.

I just tried to access the data from the pangeo Google Cloud deployment, but the area values are missing. I can ask if these can be added in the future, but in the meantime, does anyone have some example data that I could experiment with?

Let us know whether the files above are useful. The lon_lat_bounds info should be available in all the CMIP6 files directly in principle, e.g. in

wget http://esgf3.dkrz.de/thredds/fileServer/cmip6/CMIP/AWI/AWI-CM-1-1-MR/piControl/r1i1p1f1/Omon/thetao/gn/v20181218/thetao_Omon_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn_240101-241012.nc

you can find "lon", "lat", "lon_bnds", "lat_bnds", respectively. These files are understood also by the Climate Data Operators so that one can write things like

_cdo griddes thetao_Omon_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn_240101-241012.nc > griddesMR.txt

Even remapping (to 1°x1° for example) works out of the box (see https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/)

_cdo remapycon,global_1 thetao_Omon_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn240101-241012.nc out.nc ,

which gives me hope that we can also solve these remaining issues relatively easily.

jbusecke commented 3 years ago

Thank you very much @trackow. I am currently very busy with revisions for another project, but I will come back to this for sure. I appreciate the patience.