Open koldunovn opened 3 years ago
Are these files already part of the ESGF archive? We need some way to get some way to get them in the cloud. Technically I see no problem with them just working, if the metadata is set similar to the other models. The matching function is purely based on dataset attributes! So am I correct that instead of a 3/4 dim array (x,y,z,?t) the metric/tracer data would be in some form like (node,z,?time)?
The only big problem I see is that sometimes areacello should be 3D.
Not generally a problem in my view. We can deal with time variable thicknesses which are 4D just fine for the other models.
I just tried to access the data from the pangeo Google Cloud deployment, but the area values are missing. I can ask if these can be added in the future, but in the meantime, does anyone have some example data that I could experiment with?
We will compute all necessary (3D and 2D) weights and area values over the weekend with @dsidoren and will come back to you early next week! All experiment data (for LR and MR resolutions) is summarized here: https://marine-data.de/?site=data&q=AWI&qf=organisations.provider.abbreviation%2FDKRZ
In particular, there is CMIP6 Deck for the MR grid: https://cera-www.dkrz.de/WDCC/ui/cerasearch/cmip6?input=CMIP6.CMIP.AWI.AWI-CM-1-1-MR
and for the LR grid e.g.: https://cera-www.dkrz.de/WDCC/ui/cerasearch/cmip6?input=CMIP6.CMIP.AWI.AWI-ESM-1-1-LR
Hello @jbusecke , we have the weights ready - but I want to confirm these do the right thing before sending them to you.
Cool, let me know when you think I could dive in. Also no particular rush! I have a good bunch of things on my plate to keep me busy.
Quick update: #165 did not quite solve this here. I will try to finish that one of in another PR. Just as a reminder, this is what I did:
from cmip6_preprocessing.utils import google_cmip_col
col = google_cmip_col()
from cmip6_preprocessing.preprocessing import combined_preprocessing, rename_cmip6
cat = col.search(source_id='AWI-ESM-1-1-LR', variable_id=['thetao'], grid_label='gn')
ddict_awi = cat.to_dataset_dict(aggregate=False, preprocess=combined_preprocessing)
and I got an error in the parse_lon_lat_bounds
:
/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/coding/times.py:517: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
/srv/conda/envs/notebook/lib/python3.8/site-packages/numpy/core/_asarray.py:102: SerializationWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range
return array(a, dtype, copy=False, order=order)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/merge_util.py in _open_asset(path, data_format, zarr_kwargs, cdf_kwargs, preprocess, varname, requested_variables)
314 try:
--> 315 ds = preprocess(ds)
316 except Exception as exc:
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/preprocessing.py in combined_preprocessing(ds)
450 # rename the `bounds` according to their style (bound or vertex)
--> 451 ds = parse_lon_lat_bounds(ds)
452 # sort verticies in a consistent manner
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/preprocessing.py in parse_lon_lat_bounds(ds)
282 if "x" not in ds.lat_bounds.dims:
--> 283 ds.coords["lat_bounds"] = ds.coords["lat_bounds"] * xr.ones_like(ds.x)
284
/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/common.py in __getattr__(self, name)
238 return source[name]
--> 239 raise AttributeError(
240 "{!r} object has no attribute {!r}".format(type(self).__name__, name)
AttributeError: 'Dataset' object has no attribute 'x'
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-14-4486757f4be9> in <module>
2
3 cat = col.search(source_id='AWI-ESM-1-1-LR', variable_id=['thetao'], grid_label='gn')
----> 4 ddict_awi = cat.to_dataset_dict(aggregate=False, preprocess=combined_preprocessing)
/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/core.py in to_dataset_dict(self, zarr_kwargs, cdf_kwargs, preprocess, storage_options, progressbar, aggregate)
928 ]
929 for i, task in enumerate(concurrent.futures.as_completed(future_tasks)):
--> 930 key, ds = task.result()
931 self._datasets[key] = ds
932 if self.progressbar:
/srv/conda/envs/notebook/lib/python3.8/concurrent/futures/_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
--> 432 return self.__get_result()
433
434 self._condition.wait(timeout)
/srv/conda/envs/notebook/lib/python3.8/concurrent/futures/_base.py in __get_result(self)
386 def __get_result(self):
387 if self._exception:
--> 388 raise self._exception
389 else:
390 return self._result
/srv/conda/envs/notebook/lib/python3.8/concurrent/futures/thread.py in run(self)
55
56 try:
---> 57 result = self.fn(*self.args, **self.kwargs)
58 except BaseException as exc:
59 self.future.set_exception(exc)
/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/core.py in _load_source(key, source)
914
915 def _load_source(key, source):
--> 916 return key, source.to_dask()
917
918 sources = {key: source(**source_kwargs) for key, source in self.items()}
/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/source.py in to_dask(self)
87 def to_dask(self):
88 """Return xarray object (which will have chunks)"""
---> 89 self._load_metadata()
90 return self._ds
91
/srv/conda/envs/notebook/lib/python3.8/site-packages/intake/source/base.py in _load_metadata(self)
234 """load metadata only if needed"""
235 if self._schema is None:
--> 236 self._schema = self._get_schema()
237 self.dtype = self._schema.dtype
238 self.shape = self._schema.shape
/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/source.py in _get_schema(self)
55
56 if self._ds is None:
---> 57 self._open_dataset()
58
59 metadata = {
/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/source.py in _open_dataset(self)
73 def _open_dataset(self):
74 mapper = _path_to_mapper(self.row[self.path_column], self.storage_options, self.data_format)
---> 75 ds = _open_asset(
76 mapper,
77 data_format=self.data_format,
/srv/conda/envs/notebook/lib/python3.8/site-packages/intake_esm/merge_util.py in _open_asset(path, data_format, zarr_kwargs, cdf_kwargs, preprocess, varname, requested_variables)
315 ds = preprocess(ds)
316 except Exception as exc:
--> 317 raise RuntimeError(
318 f'Failed to apply pre-processing function: {preprocess.__name__}'
319 ) from exc
RuntimeError: Failed to apply pre-processing function: combined_preprocessing
Solving this should be as simple as testing with an appropriate array and putting some extra if statements in the code.
Hello @jbusecke , thank you for your continued efforts to include AWI-CM and AWI-ESM as well, which we appreciate very much! For the configurations with the LR, MR, and HR ocean grids, we now uploaded the cluster areas and volumes to the DKRZ cloud.
"LR", "MR", and "HR" stand for the respective ocean grids; "vol2D" is the cluster area for the 2D nodes at the surface (e.g. relevant for SST, MLD, SSS etc). "vol3D" is for computing 3D-means for example (thetao etc). For means on a respective level in the 3D ocean, say at 1000m or so, we cannot use "vol2D" because the cluster areas will look different at depth due to bathymetry. This is what the vol23D files are for: they define the cluster areas for the respective ocean levels.
I just tried to access the data from the pangeo Google Cloud deployment, but the area values are missing. I can ask if these can be added in the future, but in the meantime, does anyone have some example data that I could experiment with?
Let us know whether the files above are useful. The lon_lat_bounds info should be available in all the CMIP6 files directly in principle, e.g. in
you can find "lon", "lat", "lon_bnds", "lat_bnds", respectively. These files are understood also by the Climate Data Operators so that one can write things like
_cdo griddes thetao_Omon_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn_240101-241012.nc > griddesMR.txt
Even remapping (to 1°x1° for example) works out of the box (see https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/)
_cdo remapycon,global_1 thetao_Omon_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn240101-241012.nc out.nc ,
which gives me hope that we can also solve these remaining issues relatively easily.
Thank you very much @trackow. I am currently very busy with revisions for another project, but I will come back to this for sure. I appreciate the patience.
I suggest to add some limited support for unstructured grids, since @jbusecke expressed his interest. From CMIP6 models this will consider AWI-CM (FESOM1.4) and MPAS-O (E3SM). All metrics that are based on multiplying scalar data by
areacelo
and depth of the vertical levels should be possible. The only big problem I see is that sometimesareacello
should be 3D.At the end it would be nice to see unstructured models in analysis like this https://cmip6-preprocessing.readthedocs.io/en/latest/parsing_metrics.html
Some very basic information on FESOM1.4 mesh is available here: https://fesom.de/cmip6/work-with-awi-cm-unstructured-data/
@trackow , @helgegoessling and @hegish you now AWI-CM1 grids much better than I do, can you share the file with node weights for LR and MR meshes? Also do you see some potential problems?
@mark-petersen Maybe your group will be interested as well?
@jbusecke What do you think should be our first steps apart from providing the file with weights? :)