willirath / xorca

MIT License
17 stars 12 forks source link

Application of XORCA #26

Open sryan288 opened 5 years ago

sryan288 commented 5 years ago

Hi,

we only have a prepared subset of output files that I could bring over on an external, i.e we don't have the full output and no aux_files, which are required as an input for load_orca_dataset. Is it still possible to use XORCA? If needed I can supply the readme file that we got with the data containing more details.

Also I have a general question, which maybe you can help with? If I read for example the temperature files into an xarray, the array has logical coordinates x and y as dimensions and the physical coordinates nav_lat & nav_lon, which I assume is due to the irregular grid. This makes indexing, e.g. cutting out a lat,lon region very difficult, at least for me as a python-beginner who used the find-function in Matlab a lot. Sorry, this is not xorca specific but maybe you have some tips.

Thanks a lot for your help, Willi! Svenja

jk-rieck commented 5 years ago

Hi Svenja,

as far as I know you need aux_files for now. I guess depending on what coordinate info you have along with data, you could build your own aux_files. Maybe this could help with that: https://github.com/willirath/xorca_autodetect_pocs/blob/master/Autodetect_horizontal_grid.ipynb .

Regarding your second question: I spent quite some time on this issue and I wrote some modules to deal with that, however they are stored here on the internal server... Basically, if you don't consider special cases like extracting across the dateline or the grid´s east-west boundary, you can do this with dataArray.where() without any problems... Something like region = data.where((data.lon > lon_min) & (data.lon < lon_max) & (data.lat > lat_min) & (data.lat < lat_max), drop=True)

Cheers, Jan

sryan288 commented 5 years ago

Hi Jan,

thanks a lot for your reply! Unfortunately we don't have any mesh files, but I will talk to Caroline about this. I assume it should be possible to get them.

For now the regional extraction should work for me, thanks for the code example! Maybe at some point I (or someone else in our working group) will come to the special cases; if it's ok, I will just contact you again ;-)

Do you guys somehow resort the data to go from -180 to 180 or do you keep the folding line were it is and work around it?

Cheers from Cape Cod! Svenja

jk-rieck commented 5 years ago

I usually leave the folding line where it is unless I want to investigate a region that spans across it... If you need to shift the data you could try xarray's data.roll(). This works nicely if you extract regions... for global data however it becomes tricky because the grid bends around in the north and you'd not simply end up with data ranging from -180 to 180. Cheers, Jan

willirath commented 5 years ago

thanks a lot for your reply! Unfortunately we don't have any mesh files, but I will talk to Caroline about this. I assume it should be possible to get them.

I recommend getting the mesh and mask files: Without the info in the mesh files, you probably have no definite info on grid constants etc. While you often can derive the grid constants from the lat and lon info that is provided with the physical output fields, you're risking in-accuracies in the presence of partial cells (nof fully filled bottom grid cells that better follow topography) or if the horizontal grid has been modified, to, e.g., allow for more volume flow through narrow straits that only span a few grid boxes. For large-scale aggregations (basin-wide means or integrals) you're probably fine, because the inaccuracies will remain small, though.

If this is okay with the people providing the data, feel free to just attach the README you got with the data and the output of either

ncdump -h <datafile>

or of

print(xarray.open_dataset(<datafile>))

in this issue. I'd be glad to help sorting out the best way to work with the data.

willirath commented 5 years ago

If this is okay with the people providing the data, feel free to just attach the README you got with the data and the output of either [...]

If it's not possible to post this publicly, you can also send the info by e-mail.

sryan288 commented 5 years ago

Hey,

sorry for the delayed response. The output is:

ncdump -h <datafile>

`netcdf ORCA025.*_1m_20160101_20161231_STupper700m_grid_T {
dimensions:
    deptht = 23 ;
    axis_nbounds = 2 ;
    y = 1021 ;
    x = 1442 ;
    time_counter = UNLIMITED ; // (12 currently)
variables:
    float deptht(deptht) ;
        deptht:name = "deptht" ;
        deptht:long_name = "Vertical T levels" ;
        deptht:units = "m" ;
        deptht:axis = "Z" ;
        deptht:positive = "down" ;
        deptht:bounds = "deptht_bounds" ;
    float deptht_bounds(deptht, axis_nbounds) ;
    float nav_lat(y, x) ;
        nav_lat:standard_name = "latitude" ;
        nav_lat:long_name = "Latitude" ;
        nav_lat:units = "degrees_north" ;
    float nav_lon(y, x) ;
        nav_lon:standard_name = "longitude" ;
        nav_lon:long_name = "Longitude" ;
        nav_lon:units = "degrees_east" ;
    double time_centered(time_counter) ;
        time_centered:standard_name = "time" ;
        time_centered:long_name = "Time axis" ;
        time_centered:calendar = "gregorian" ;
        time_centered:units = "seconds since 1900-01-01 00:00:00" ;
        time_centered:time_origin = "1900-01-01 00:00:00" ;
        time_centered:bounds = "time_centered_bounds" ;
    double time_centered_bounds(time_counter, axis_nbounds) ;
    double time_counter(time_counter) ;
        time_counter:axis = "T" ;
        time_counter:standard_name = "time" ;
        time_counter:long_name = "Time axis" ;
        time_counter:calendar = "gregorian" ;
        time_counter:units = "seconds since 1900-01-01 00:00:00" ;
        time_counter:time_origin = "1900-01-01 00:00:00" ;
        time_counter:bounds = "time_counter_bounds" ;
    double time_counter_bounds(time_counter, axis_nbounds) ;
    float vosaline(time_counter, deptht, y, x) ;
        vosaline:standard_name = "sea_water_salinity" ;
        vosaline:long_name = "Sea Water Salinity" ;
        vosaline:units = "0.001" ;
        vosaline:online_operation = "average" ;
        vosaline:interval_operation = "1350 s" ;
        vosaline:interval_write = "1 month" ;
        vosaline:cell_methods = "time: mean (interval: 1350 s)" ;
        vosaline:_FillValue = 1.e+20f ;
        vosaline:missing_value = 1.e+20f ;
        vosaline:coordinates = "time_centered deptht nav_lat nav_lon" ;
    float votemper(time_counter, deptht, y, x) ;
        votemper:standard_name = "sea_water_potential_temperature" ;
        votemper:long_name = "Sea Water Potential Temperature" ;
        votemper:units = "degree_C" ;
        votemper:online_operation = "average" ;
        votemper:interval_operation = "1350 s" ;
        votemper:interval_write = "1 month" ;
        votemper:cell_methods = "time: mean (interval: 1350 s)" ;
        votemper:_FillValue = 1.e+20f ;
        votemper:missing_value = 1.e+20f ;
        votemper:coordinates = "time_centered deptht nav_lat nav_lon" ;

// global attributes:
        :name = "_OUT/ORCA025.*_1m_20160101_20161231_grid_T" ;
        :description = "ocean T grid variables" ;
        :title = "ocean T grid variables" ;
        :Conventions = "CF-1.6" ;
        :timeStamp = "2018-May-12 12:16:19 GMT" ;
        :uuid = "354166b8-ed81-4230-a2ec-726baed61c33" ;
        :history = "Wed Apr 24 10:02:37 2019: ncks -o ORCA025.*_1m_20160101_20161231_STupper700m_grid_T.nc -v votemper,vosaline -d deptht,1,23 ./ORCA025.L46/ORCA025*/*1m_20160101_20161231_grid_T.nc" ;
        :NCO = "netCDF Operators version 4.7.5 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
}`

A few things to mention:

I tried to only load the T and S files (the ones shown in ncdump above) but get the following error message in Python: %%time %%memit ds_xorca = load_xorca_dataset(data_files=data_files, aux_files=aux_files)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<timed eval> in <module>

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2356             with self.builtin_trap:
   2357                 args = (magic_arg_s, cell)
-> 2358                 result = fn(*args, **kwargs)
   2359             return result
   2360 

</home/sryan/miniconda3/envs/xorca_env/lib/python3.7/site-packages/decorator.py:decorator-gen-127> in memit(self, line, cell)

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/memory_profiler.py in memit(self, line, cell)
   1013                                timeout=timeout, interval=interval,
   1014                                max_usage=True,
-> 1015                                include_children=include_children)
   1016             mem_usage.append(tmp[0])
   1017 

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/memory_profiler.py in memory_usage(proc, interval, timeout, timestamps, include_children, multiprocess, max_usage, retval, stream, backend)
    334             # Therefore, the whole process hangs indefinitely. Here, we are ensuring that the process gets killed!
    335             try:
--> 336                 returned = f(*args, **kw)
    337                 parent_conn.send(0)  # finish timing
    338                 ret = parent_conn.recv()

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/memory_profiler.py in _func_exec(stmt, ns)
    786     # helper for magic_memit, just a function proxy for the exec
    787     # statement
--> 788     exec(stmt, ns)
    789 
    790 

<string> in <module>

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xorca/lib.py in load_xorca_dataset(data_files, aux_files, decode_cf, **kwargs)
    389         aux_ds.update(
    390             rename_dims(xr.open_dataset(af, decode_cf=False,
--> 391                                         chunks=ac)))
    392     # Again, we first have to open all data sets to filter the input chunks.
    393     _data_files_chunks = map(

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/dataset.py in update(self, other, inplace)
   2511             dataset.
   2512         """
-> 2513         variables, coord_names, dims = dataset_update_method(self, other)
   2514 
   2515         return self._replace_vars_and_dims(variables, coord_names, dims,

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/merge.py in dataset_update_method(dataset, other)
    583 
    584     return merge_core([dataset, other], priority_arg=1,
--> 585                       indexes=dataset.indexes)

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
    437 
    438     coerced = coerce_pandas_values(objs)
--> 439     aligned = deep_align(coerced, join=join, copy=False, indexes=indexes)
    440     expanded = expand_variable_dicts(aligned)
    441 

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/alignment.py in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid)
    214 
    215     aligned = align(*targets, join=join, copy=copy, indexes=indexes,
--> 216                     exclude=exclude)
    217 
    218     for position, key, aligned_obj in zip(positions, keys, aligned):

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/alignment.py in align(*objects, **kwargs)
    152                     'arguments without labels along dimension %r cannot be '
    153                     'aligned because they have different dimension sizes: %r'
--> 154                     % (dim, sizes))
    155 
    156     result = []

ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {1, 1442}

Thanks!!!!

willirath commented 5 years ago

In the error message (interesting part is at the bottom), Xarray complains about the dimension 'x' being of different length for different inputs. The line at which xorca fails is this one: https://github.com/willirath/xorca/blob/bbc574e51128fc2237d04abeb4ec4c356e272e25/xorca/lib.py#L389_L391 where it adds data from the aux files to the final dataset.

In

ds_xorca = load_xorca_dataset(data_files=data_files, aux_files=aux_files)

you have a list aux_files containing the mesh_....nc and mask_....nc files.

My guess is that there's a file among the aux_files that only has a singleton x dimension. Can you check this?

sryan288 commented 5 years ago

load_xorca_dataset is now working but for 2D fields, such as MLD and TAUX. When I try to load a 3D field, it reads in the coordinates correctly but returns an empty field for the data variable. I assume that this is because our data is cut vertically, while the mesh files are full depth and that the xarray auto_combine cannot handle different sizes for the dimensions?

I tried to cut the mesh files, to see if that is the problem but run into an netcdf/hdf error with mesh_zgr.nc while it works with mesh_hgr.nc I am adding the code and error message in Python below.

Another problem might then be that our velocity fields are not only cut in the vertical but horizontal as well. Do you think it still all can work?

zmesh = xr.open_dataset('/vortex/clidex/data/ORCA/ORCA025.L46.LIM2vp.CFCSF6.JRA.XIOS2-K003.hindcast/mesh_zgr.nc').isel(z=slice(0,23)) zmesh.to_netcdf('~/mesh_zgr_cut.nc')

`---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-19-a047b0853830> in <module>
      6 zmesh = xr.open_dataset('/vortex/clidex/data/ORCA/ORCA025.L46.LIM2vp.CFCSF6.JRA.XIOS2-K003.hindcast/' + 
      7                       'mesh_zgr.nc').isel(z=slice(0,23)).drop('nav_lev')
----> 8 zmesh.to_netcdf('~/mesh_zgr_cut.nc')

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1538             unlimited_dims=unlimited_dims,
   1539             compute=compute,
-> 1540             invalid_netcdf=invalid_netcdf,
   1541         )
   1542 

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1072         # to be parallelized with dask
   1073         dump_to_store(
-> 1074             dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1075         )
   1076         if autoclose:

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1118         variables, attrs = encoder(variables, attrs)
   1119 
-> 1120     store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
   1121 
   1122 

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    296             writer = ArrayWriter()
    297 
--> 298         variables, attributes = self.encode(variables, attributes)
    299 
    300         self.set_attributes(attributes)

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/common.py in encode(self, variables, attributes)
    385         # All NetCDF files get CF encoded by default, without this attempting
    386         # to write times, for example, would fail.
--> 387         variables, attributes = cf_encoder(variables, attributes)
    388         variables = OrderedDict(
    389             [(k, self.encode_variable(v)) for k, v in variables.items()]

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/conventions.py in cf_encoder(variables, attributes)
    744 
    745     # add encoding for time bounds variables if present.
--> 746     _update_bounds_encoding(variables)
    747 
    748     new_vars = OrderedDict(

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/conventions.py in _update_bounds_encoding(variables)
    407         is_datetime_type = np.issubdtype(
    408             v.dtype, np.datetime64
--> 409         ) or contains_cftime_datetimes(v)
    410 
    411         if (

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/common.py in contains_cftime_datetimes(var)
   1257     """Check if an xarray.Variable contains cftime.datetime objects
   1258     """
-> 1259     return _contains_cftime_datetimes(var.data)
   1260 
   1261 

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/variable.py in data(self)
    325             return self._data
    326         else:
--> 327             return self.values
    328 
    329     @data.setter

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/variable.py in values(self)
    423     def values(self):
    424         """The variable's data as a numpy.ndarray"""
--> 425         return _as_array_or_item(self._data)
    426 
    427     @values.setter

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/variable.py in _as_array_or_item(data)
    236     TODO: remove this (replace with np.asarray) once these issues are fixed
    237     """
--> 238     data = np.asarray(data)
    239     if data.ndim == 0:
    240         if data.dtype.kind == "M":

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    688 
    689     def __array__(self, dtype=None):
--> 690         self._ensure_cached()
    691         return np.asarray(self.array, dtype=dtype)
    692 

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/indexing.py in _ensure_cached(self)
    685     def _ensure_cached(self):
    686         if not isinstance(self.array, NumpyIndexingAdapter):
--> 687             self.array = NumpyIndexingAdapter(np.asarray(self.array))
    688 
    689     def __array__(self, dtype=None):

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    664 
    665     def __array__(self, dtype=None):
--> 666         return np.asarray(self.array, dtype=dtype)
    667 
    668     def __getitem__(self, key):

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    568     def __array__(self, dtype=None):
    569         array = as_indexable(self.array)
--> 570         return np.asarray(array[self.key], dtype=None)
    571 
    572     def transpose(self, order):

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in __getitem__(self, key)
     73     def __getitem__(self, key):
     74         return indexing.explicit_indexing_adapter(
---> 75             key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
     76         )
     77 

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    848     """
    849     raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 850     result = raw_indexing_method(raw_key.tuple)
    851     if numpy_indices.tuple:
    852         # index the loaded np.ndarray

~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key)
     85             with self.datastore.lock:
     86                 original_array = self.get_array(needs_lock=False)
---> 87                 array = getitem(original_array, key)
     88         except IndexError:
     89             # Catch IndexError in netCDF4 and return a more informative

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: HDF error`
jk-rieck commented 5 years ago

Hi Svenja, yes, the mesh and mask files should have the same dimension sizes as the data (vertical and horizontal). In such cases as yours, I also cut them before loading. I usually do that with NCO.. Something like ncks -d z,0,22 mesh_zgr.nc mesh_zgr_cut.nc should do it in your example. Considering your error when trying to do the cutting with xarray: What netcdf format is the mesh_zgr.nc file in?

sryan288 commented 5 years ago

Hey Jan Klaus, yes, I should have mentioned that I tried NCO and CDO to cut the file but this also gives me an error message (see below). It doesn't seem to like the e3v_0 variable, although I don't see any difference to the other variables stored in the file.

(base) [sryan@hydroclim ~]$ ncks -d z,0,22 mesh_zgr.nc mesh_zgr_cut.nc ERROR: nco_get_vara() failed to nc_get_vara() variable "e3v_0" nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco_get_vara() nco_err_exit(): ERROR Error code is -101. Translation into English with nc_strerror(-101) is "NetCDF: HDF error" nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)

The file has netCDF-4 format. However, the mehs_hgr.nc has the same format and that one I can cut without a problem, which confuses me.

jk-rieck commented 5 years ago

I think in mesh_hgr.nc the only variable that actually has a depth dimension is nav_lev, so that might explain the different behavior. One thing you could try is using a different version of NCO or trying to convert the files to netcdf3 before the cutting...

willirath commented 5 years ago

I did not have the time to look into this in detail yet. Are these the files I sent you?


On a different note: As far as I can see, there's lots of complications resulting from the fact that the data are reduced to a region. I understand that this is because cost for storage and transfer bandwidth needs to be accounted for. But let's really consider just going for the full fields.

sryan288 commented 5 years ago

I tried with CDO and also didn't work and converting to netcdf3 seems to not work because one variable exceeds the size limit.

Willi, yes these are the files you sent me.

I brought the data with me on an external but if it is possible to access your data externally, I guess I could try to get the global velocity fields instead and only cut them in the vertical, to match the other fields. Storage wise that should be ok from our side. Let me know if you think that's the best option than I can could message Markus.