Open sryan288 opened 5 years ago
Hi Svenja,
as far as I know you need aux_files
for now. I guess depending on what coordinate info you have along with data, you could build your own aux_files
. Maybe this could help with that: https://github.com/willirath/xorca_autodetect_pocs/blob/master/Autodetect_horizontal_grid.ipynb .
Regarding your second question: I spent quite some time on this issue and I wrote some modules to deal with that, however they are stored here on the internal server... Basically, if you don't consider special cases like extracting across the dateline or the grid´s east-west boundary, you can do this with dataArray.where()
without any problems... Something like region = data.where((data.lon > lon_min) & (data.lon < lon_max) & (data.lat > lat_min) & (data.lat < lat_max), drop=True)
Cheers, Jan
Hi Jan,
thanks a lot for your reply! Unfortunately we don't have any mesh files, but I will talk to Caroline about this. I assume it should be possible to get them.
For now the regional extraction should work for me, thanks for the code example! Maybe at some point I (or someone else in our working group) will come to the special cases; if it's ok, I will just contact you again ;-)
Do you guys somehow resort the data to go from -180 to 180 or do you keep the folding line were it is and work around it?
Cheers from Cape Cod! Svenja
I usually leave the folding line where it is unless I want to investigate a region that spans across it... If you need to shift the data you could try xarray's data.roll()
. This works nicely if you extract regions... for global data however it becomes tricky because the grid bends around in the north and you'd not simply end up with data ranging from -180 to 180.
Cheers,
Jan
thanks a lot for your reply! Unfortunately we don't have any mesh files, but I will talk to Caroline about this. I assume it should be possible to get them.
I recommend getting the mesh and mask files: Without the info in the mesh files, you probably have no definite info on grid constants etc. While you often can derive the grid constants from the lat and lon info that is provided with the physical output fields, you're risking in-accuracies in the presence of partial cells (nof fully filled bottom grid cells that better follow topography) or if the horizontal grid has been modified, to, e.g., allow for more volume flow through narrow straits that only span a few grid boxes. For large-scale aggregations (basin-wide means or integrals) you're probably fine, because the inaccuracies will remain small, though.
If this is okay with the people providing the data, feel free to just attach the README you got with the data and the output of either
ncdump -h <datafile>
or of
print(xarray.open_dataset(<datafile>))
in this issue. I'd be glad to help sorting out the best way to work with the data.
If this is okay with the people providing the data, feel free to just attach the README you got with the data and the output of either [...]
If it's not possible to post this publicly, you can also send the info by e-mail.
Hey,
sorry for the delayed response. The output is:
ncdump -h <datafile>
`netcdf ORCA025.*_1m_20160101_20161231_STupper700m_grid_T {
dimensions:
deptht = 23 ;
axis_nbounds = 2 ;
y = 1021 ;
x = 1442 ;
time_counter = UNLIMITED ; // (12 currently)
variables:
float deptht(deptht) ;
deptht:name = "deptht" ;
deptht:long_name = "Vertical T levels" ;
deptht:units = "m" ;
deptht:axis = "Z" ;
deptht:positive = "down" ;
deptht:bounds = "deptht_bounds" ;
float deptht_bounds(deptht, axis_nbounds) ;
float nav_lat(y, x) ;
nav_lat:standard_name = "latitude" ;
nav_lat:long_name = "Latitude" ;
nav_lat:units = "degrees_north" ;
float nav_lon(y, x) ;
nav_lon:standard_name = "longitude" ;
nav_lon:long_name = "Longitude" ;
nav_lon:units = "degrees_east" ;
double time_centered(time_counter) ;
time_centered:standard_name = "time" ;
time_centered:long_name = "Time axis" ;
time_centered:calendar = "gregorian" ;
time_centered:units = "seconds since 1900-01-01 00:00:00" ;
time_centered:time_origin = "1900-01-01 00:00:00" ;
time_centered:bounds = "time_centered_bounds" ;
double time_centered_bounds(time_counter, axis_nbounds) ;
double time_counter(time_counter) ;
time_counter:axis = "T" ;
time_counter:standard_name = "time" ;
time_counter:long_name = "Time axis" ;
time_counter:calendar = "gregorian" ;
time_counter:units = "seconds since 1900-01-01 00:00:00" ;
time_counter:time_origin = "1900-01-01 00:00:00" ;
time_counter:bounds = "time_counter_bounds" ;
double time_counter_bounds(time_counter, axis_nbounds) ;
float vosaline(time_counter, deptht, y, x) ;
vosaline:standard_name = "sea_water_salinity" ;
vosaline:long_name = "Sea Water Salinity" ;
vosaline:units = "0.001" ;
vosaline:online_operation = "average" ;
vosaline:interval_operation = "1350 s" ;
vosaline:interval_write = "1 month" ;
vosaline:cell_methods = "time: mean (interval: 1350 s)" ;
vosaline:_FillValue = 1.e+20f ;
vosaline:missing_value = 1.e+20f ;
vosaline:coordinates = "time_centered deptht nav_lat nav_lon" ;
float votemper(time_counter, deptht, y, x) ;
votemper:standard_name = "sea_water_potential_temperature" ;
votemper:long_name = "Sea Water Potential Temperature" ;
votemper:units = "degree_C" ;
votemper:online_operation = "average" ;
votemper:interval_operation = "1350 s" ;
votemper:interval_write = "1 month" ;
votemper:cell_methods = "time: mean (interval: 1350 s)" ;
votemper:_FillValue = 1.e+20f ;
votemper:missing_value = 1.e+20f ;
votemper:coordinates = "time_centered deptht nav_lat nav_lon" ;
// global attributes:
:name = "_OUT/ORCA025.*_1m_20160101_20161231_grid_T" ;
:description = "ocean T grid variables" ;
:title = "ocean T grid variables" ;
:Conventions = "CF-1.6" ;
:timeStamp = "2018-May-12 12:16:19 GMT" ;
:uuid = "354166b8-ed81-4230-a2ec-726baed61c33" ;
:history = "Wed Apr 24 10:02:37 2019: ncks -o ORCA025.*_1m_20160101_20161231_STupper700m_grid_T.nc -v votemper,vosaline -d deptht,1,23 ./ORCA025.L46/ORCA025*/*1m_20160101_20161231_grid_T.nc" ;
:NCO = "netCDF Operators version 4.7.5 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
}`
A few things to mention:
I tried to only load the T and S files (the ones shown in ncdump above) but get the following error message in Python:
%%time %%memit ds_xorca = load_xorca_dataset(data_files=data_files, aux_files=aux_files)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<timed eval> in <module>
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2356 with self.builtin_trap:
2357 args = (magic_arg_s, cell)
-> 2358 result = fn(*args, **kwargs)
2359 return result
2360
</home/sryan/miniconda3/envs/xorca_env/lib/python3.7/site-packages/decorator.py:decorator-gen-127> in memit(self, line, cell)
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/memory_profiler.py in memit(self, line, cell)
1013 timeout=timeout, interval=interval,
1014 max_usage=True,
-> 1015 include_children=include_children)
1016 mem_usage.append(tmp[0])
1017
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/memory_profiler.py in memory_usage(proc, interval, timeout, timestamps, include_children, multiprocess, max_usage, retval, stream, backend)
334 # Therefore, the whole process hangs indefinitely. Here, we are ensuring that the process gets killed!
335 try:
--> 336 returned = f(*args, **kw)
337 parent_conn.send(0) # finish timing
338 ret = parent_conn.recv()
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/memory_profiler.py in _func_exec(stmt, ns)
786 # helper for magic_memit, just a function proxy for the exec
787 # statement
--> 788 exec(stmt, ns)
789
790
<string> in <module>
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xorca/lib.py in load_xorca_dataset(data_files, aux_files, decode_cf, **kwargs)
389 aux_ds.update(
390 rename_dims(xr.open_dataset(af, decode_cf=False,
--> 391 chunks=ac)))
392 # Again, we first have to open all data sets to filter the input chunks.
393 _data_files_chunks = map(
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/dataset.py in update(self, other, inplace)
2511 dataset.
2512 """
-> 2513 variables, coord_names, dims = dataset_update_method(self, other)
2514
2515 return self._replace_vars_and_dims(variables, coord_names, dims,
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/merge.py in dataset_update_method(dataset, other)
583
584 return merge_core([dataset, other], priority_arg=1,
--> 585 indexes=dataset.indexes)
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
437
438 coerced = coerce_pandas_values(objs)
--> 439 aligned = deep_align(coerced, join=join, copy=False, indexes=indexes)
440 expanded = expand_variable_dicts(aligned)
441
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/alignment.py in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid)
214
215 aligned = align(*targets, join=join, copy=copy, indexes=indexes,
--> 216 exclude=exclude)
217
218 for position, key, aligned_obj in zip(positions, keys, aligned):
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/alignment.py in align(*objects, **kwargs)
152 'arguments without labels along dimension %r cannot be '
153 'aligned because they have different dimension sizes: %r'
--> 154 % (dim, sizes))
155
156 result = []
ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension sizes: {1, 1442}
Thanks!!!!
In the error message (interesting part is at the bottom), Xarray complains about the dimension 'x'
being of different length for different inputs. The line at which xorca fails is this one: https://github.com/willirath/xorca/blob/bbc574e51128fc2237d04abeb4ec4c356e272e25/xorca/lib.py#L389_L391 where it adds data from the aux files to the final dataset.
In
ds_xorca = load_xorca_dataset(data_files=data_files, aux_files=aux_files)
you have a list aux_files
containing the mesh_....nc
and mask_....nc
files.
My guess is that there's a file among the aux_files
that only has a singleton x
dimension. Can you check this?
load_xorca_dataset is now working but for 2D fields, such as MLD and TAUX. When I try to load a 3D field, it reads in the coordinates correctly but returns an empty field for the data variable. I assume that this is because our data is cut vertically, while the mesh files are full depth and that the xarray auto_combine cannot handle different sizes for the dimensions?
I tried to cut the mesh files, to see if that is the problem but run into an netcdf/hdf error with mesh_zgr.nc while it works with mesh_hgr.nc I am adding the code and error message in Python below.
Another problem might then be that our velocity fields are not only cut in the vertical but horizontal as well. Do you think it still all can work?
zmesh = xr.open_dataset('/vortex/clidex/data/ORCA/ORCA025.L46.LIM2vp.CFCSF6.JRA.XIOS2-K003.hindcast/mesh_zgr.nc').isel(z=slice(0,23)) zmesh.to_netcdf('~/mesh_zgr_cut.nc')
`---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-19-a047b0853830> in <module>
6 zmesh = xr.open_dataset('/vortex/clidex/data/ORCA/ORCA025.L46.LIM2vp.CFCSF6.JRA.XIOS2-K003.hindcast/' +
7 'mesh_zgr.nc').isel(z=slice(0,23)).drop('nav_lev')
----> 8 zmesh.to_netcdf('~/mesh_zgr_cut.nc')
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
1538 unlimited_dims=unlimited_dims,
1539 compute=compute,
-> 1540 invalid_netcdf=invalid_netcdf,
1541 )
1542
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1072 # to be parallelized with dask
1073 dump_to_store(
-> 1074 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
1075 )
1076 if autoclose:
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
1118 variables, attrs = encoder(variables, attrs)
1119
-> 1120 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
1121
1122
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
296 writer = ArrayWriter()
297
--> 298 variables, attributes = self.encode(variables, attributes)
299
300 self.set_attributes(attributes)
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/common.py in encode(self, variables, attributes)
385 # All NetCDF files get CF encoded by default, without this attempting
386 # to write times, for example, would fail.
--> 387 variables, attributes = cf_encoder(variables, attributes)
388 variables = OrderedDict(
389 [(k, self.encode_variable(v)) for k, v in variables.items()]
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/conventions.py in cf_encoder(variables, attributes)
744
745 # add encoding for time bounds variables if present.
--> 746 _update_bounds_encoding(variables)
747
748 new_vars = OrderedDict(
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/conventions.py in _update_bounds_encoding(variables)
407 is_datetime_type = np.issubdtype(
408 v.dtype, np.datetime64
--> 409 ) or contains_cftime_datetimes(v)
410
411 if (
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/common.py in contains_cftime_datetimes(var)
1257 """Check if an xarray.Variable contains cftime.datetime objects
1258 """
-> 1259 return _contains_cftime_datetimes(var.data)
1260
1261
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/variable.py in data(self)
325 return self._data
326 else:
--> 327 return self.values
328
329 @data.setter
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/variable.py in values(self)
423 def values(self):
424 """The variable's data as a numpy.ndarray"""
--> 425 return _as_array_or_item(self._data)
426
427 @values.setter
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/variable.py in _as_array_or_item(data)
236 TODO: remove this (replace with np.asarray) once these issues are fixed
237 """
--> 238 data = np.asarray(data)
239 if data.ndim == 0:
240 if data.dtype.kind == "M":
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
688
689 def __array__(self, dtype=None):
--> 690 self._ensure_cached()
691 return np.asarray(self.array, dtype=dtype)
692
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/indexing.py in _ensure_cached(self)
685 def _ensure_cached(self):
686 if not isinstance(self.array, NumpyIndexingAdapter):
--> 687 self.array = NumpyIndexingAdapter(np.asarray(self.array))
688
689 def __array__(self, dtype=None):
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
664
665 def __array__(self, dtype=None):
--> 666 return np.asarray(self.array, dtype=dtype)
667
668 def __getitem__(self, key):
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
568 def __array__(self, dtype=None):
569 array = as_indexable(self.array)
--> 570 return np.asarray(array[self.key], dtype=None)
571
572 def transpose(self, order):
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in __getitem__(self, key)
73 def __getitem__(self, key):
74 return indexing.explicit_indexing_adapter(
---> 75 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
76 )
77
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
848 """
849 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 850 result = raw_indexing_method(raw_key.tuple)
851 if numpy_indices.tuple:
852 # index the loaded np.ndarray
~/miniconda3/envs/xorca_env/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key)
85 with self.datastore.lock:
86 original_array = self.get_array(needs_lock=False)
---> 87 array = getitem(original_array, key)
88 except IndexError:
89 # Catch IndexError in netCDF4 and return a more informative
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
RuntimeError: NetCDF: HDF error`
Hi Svenja,
yes, the mesh and mask files should have the same dimension sizes as the data (vertical and horizontal). In such cases as yours, I also cut them before loading.
I usually do that with NCO.. Something like ncks -d z,0,22 mesh_zgr.nc mesh_zgr_cut.nc
should do it in your example.
Considering your error when trying to do the cutting with xarray: What netcdf format is the mesh_zgr.nc
file in?
Hey Jan Klaus, yes, I should have mentioned that I tried NCO and CDO to cut the file but this also gives me an error message (see below). It doesn't seem to like the e3v_0 variable, although I don't see any difference to the other variables stored in the file.
(base) [sryan@hydroclim ~]$ ncks -d z,0,22 mesh_zgr.nc mesh_zgr_cut.nc ERROR: nco_get_vara() failed to nc_get_vara() variable "e3v_0" nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco_get_vara() nco_err_exit(): ERROR Error code is -101. Translation into English with nc_strerror(-101) is "NetCDF: HDF error" nco_err_exit(): ERROR NCO will now exit with system call exit(EXIT_FAILURE)
The file has netCDF-4 format. However, the mehs_hgr.nc has the same format and that one I can cut without a problem, which confuses me.
I think in mesh_hgr.nc
the only variable that actually has a depth dimension is nav_lev
, so that might explain the different behavior. One thing you could try is using a different version of NCO or trying to convert the files to netcdf3 before the cutting...
I did not have the time to look into this in detail yet. Are these the files I sent you?
On a different note: As far as I can see, there's lots of complications resulting from the fact that the data are reduced to a region. I understand that this is because cost for storage and transfer bandwidth needs to be accounted for. But let's really consider just going for the full fields.
I tried with CDO and also didn't work and converting to netcdf3 seems to not work because one variable exceeds the size limit.
Willi, yes these are the files you sent me.
I brought the data with me on an external but if it is possible to access your data externally, I guess I could try to get the global velocity fields instead and only cut them in the vertical, to match the other fields. Storage wise that should be ok from our side. Let me know if you think that's the best option than I can could message Markus.
Hi,
we only have a prepared subset of output files that I could bring over on an external, i.e we don't have the full output and no aux_files, which are required as an input for load_orca_dataset. Is it still possible to use XORCA? If needed I can supply the readme file that we got with the data containing more details.
Also I have a general question, which maybe you can help with? If I read for example the temperature files into an xarray, the array has logical coordinates x and y as dimensions and the physical coordinates nav_lat & nav_lon, which I assume is due to the irregular grid. This makes indexing, e.g. cutting out a lat,lon region very difficult, at least for me as a python-beginner who used the find-function in Matlab a lot. Sorry, this is not xorca specific but maybe you have some tips.
Thanks a lot for your help, Willi! Svenja