Closed jhamman closed 8 years ago
We do support multi-dimensional coordinates already, and even read the coordinates
attributes. However, we don't do anything with them, beyond keeping them around: http://xray.readthedocs.org/en/stable/data-structures.html#coordinates
It's definitely worth seeing if we can make ds.T.plot.contourf('lat', 'lon')
work, though.
Okay, somehow I missed that we're already doing this to some extent. I just ran across the code in conventions.py
and, yes, we don't do anything with them.
@clarkfitzg - do you have any thoughts on how this could work?
We could probably modify our contour
/contourf
wrapper to accept 2D arrays for X/Y. The matplotlib functions already handle all the necessary logic.
Are the coordinates
not dimensions of those DataArrays
? Would it be better to make them dimensions when they get read in?
These latitude and longitude coordinates can't be dimensions because they need to be 2D. A good example would be simulation data on some sort of projected grid (like in @jhamman's example above).
@shoyer - the idea of the coordinates convention is that we all ready know the x/y coordinate names. So we shouldn't need to supply them like this: ds.T.plot.contourf('lat', 'lon')
. I'm thinking it would be better to store the coord names coming from https://github.com/xray/xray/blob/master/xray/conventions.py#L882 in each DataArray. Maybe a DataArray.coord_keys = ('xc', 'yc')
attribute is in order? Currently, we aren't really using the coordinates attribute in its intended way.
In the plotting logic, we should just search to see if coord_keys is not None
before passing the coordinates to the plotting method.
My hesitation with adding coord_keys
is two-fold:
coordinates
attribute is meaningful according to CF conventions: "The value of the coordinates attribute is a blank separated list of the names of auxiliary coordinate variables. There is no restriction on the order in which the auxiliary coordinate variables appear in the coordinates attribute string."A better way to handle this might be to write a xray-cf
packages (see #461 for another use case), which registers the .cf
namespace on Dataset/DataArray objects. Then we could have ds.cf.plot
, which automatically plots according to CF conventions, and could even do things like automatically plot with cartopy if relevant. This means less domain specific details in xray, and could be a good model for other similar extension projects (e.g., xgcm).
@shoyer - For the most part, I'm in agreement here with you. A few comments (reference ncdump
of a irregular grid netCDF below).
xray
. ni
and nj
are dimensions, not coordinates but xray
turns them into coordinates. I know why we do this in practice but, for example, frac(nj, ni)
only uses nj
and ni
, yc
and xc
are it's coordinates. So, you see, there is a bit of a disconnect there. Another way to think about xray
indexing is that isel
should work relative to the dimension index, and sel
should work on the coordinates. I know I'm out on a limb here...coordinates
attribute. Not restricting the order of the coordinates is a big deficiency here.coordinate
attribute right now is that we use it to determine the coordinate names, and throw the attribute out. I have two related PRs from today just to show which way we could go with all this: #606, #608.
ncdump of domain file
(xray_dev)Joes-iMac$ ncdump -h domain.lnd.wr50a_ar9v4.100920.nc
netcdf domain.lnd.wr50a_ar9v4.100920 {
dimensions:
n = 56375 ;
ni = 275 ;
nj = 205 ;
nv = 4 ;
variables:
double xc(nj, ni) ;
xc:long_name = "longitude of grid cell center" ;
xc:units = "degrees_east" ;
xc:bounds = "xv" ;
double yc(nj, ni) ;
yc:long_name = "latitude of grid cell center" ;
yc:units = "degrees_north" ;
yc:bounds = "yv" ;
double xv(nj, ni, nv) ;
xv:long_name = "longitude of grid cell verticies" ;
xv:units = "degrees_east" ;
double yv(nj, ni, nv) ;
yv:long_name = "latitude of grid cell verticies" ;
yv:units = "degrees_north" ;
int mask(nj, ni) ;
mask:long_name = "domain mask" ;
mask:note = "unitless" ;
mask:coordinates = "xc yc" ;
mask:comment = "0 value indicates cell is not active" ;
double area(nj, ni) ;
area:long_name = "area of grid cell in radians squared" ;
area:coordinates = "xc yc" ;
area:units = "radian2" ;
double frac(nj, ni) ;
frac:long_name = "fraction of grid cell that is active" ;
frac:coordinates = "xc yc" ;
frac:note = "unitless" ;
frac:filter1 = "error if frac> 1.0+eps or frac < 0.0-eps; eps = 0.1000000E-10" ;
frac:filter2 = "limit frac to [fminval,fmaxval]; fminval= 0.1000000E-02 fmaxval= 1.000000" ;
// global attributes:
:title = "CCSM domain data:" ;
:Conventions = "CF-1.0" ;
xray representation of domain file
In [1]: import xray
In [2]: xray.open_dataset('domain.lnd.wr50a_ar9v4.100920.nc')
Out[2]:
<xray.Dataset>
Dimensions: (ni: 275, nj: 205, nv: 4)
Coordinates:
xc (nj, ni) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 ...
yc (nj, ni) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ...
* ni (ni) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
* nj (nj) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
* nv (nv) int64 0 1 2 3
Data variables:
xv (nj, ni, nv) float64 189.3 189.4 189.2 189.0 189.4 189.6 189.3 ...
yv (nj, ni, nv) float64 16.33 16.58 16.74 16.49 16.58 16.82 16.98 ...
mask (nj, ni) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
area (nj, ni) float64 2.579e-05 2.594e-05 2.611e-05 2.627e-05 ...
frac (nj, ni) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Attributes:
title: CCSM domain data:
Conventions: CF-1.0
The main thing I don't like about how we handle the coordinate attribute right now is that we use it to determine the coordinate names, and throw the attribute out. I seem to remember bringing this up in the past with our discussions of the calendar attribute and still don't like that we remove attributes that we decode. Especially this once since we don't do much with it.
In the case of calendars, the information ends up on the encoding
attribute. I think you're right though that we currently throw away variable specific coordinates
information -- that should probably also be preserved on encoding
.
There are two reasons why we do this sort of thing:
xc
as a coordinate on the dataset, but it remained in the coordinates
attribute on mask
? Or if we deleted xc
entirely?These sort of ambiguous scenarios are the part that worry me. In particular, I worry that it leads us down a path of maintaining arbitrary metadata through operations.
In the case of calendars, the information ends up on the encoding attribute.
You are right, I had forgotten about this. My original statement has been amended accordingly.
I think we're square on my points 2 and 3. Does my first point make sense?
I'll open an issue regarding the coordinates encoding attribute.
closed via #608 and #610.
The CF Conventions supports the notion of a 2d coordinate variable in the case of irregularly spaced data. An example of this sort of dataset is below. The CF Convention is to add a "coordinates" attribute with a string describing the 2d coordinates.
I'd like to discuss how we could support this in xray. There motivating application for this is in plotting operations but it may also have application in other grouping and remapping operations (e.g. #324, #475, #486).
One option would just to honor the "coordinates" attr in plotting and use the specified coordinates as the x/y values.
ref: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#idp5559280