r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
562 stars 94 forks source link

names of spatial raster dimensions need to be x and y #51

Closed edzer closed 6 years ago

edzer commented 6 years ago

@mdsumner I noted that in

> library(stars)
Loading required package: abind
Loading required package: sf
Linking to GEOS 3.6.2, GDAL 2.2.3, proj.4 4.9.3
> example(read_ncdf)

rd_ncd> f <- system.file("nc/reduced.nc", package = "stars")

rd_ncd> read_ncdf(f)
stars object with 4 dimensions and 4 attributes
attribute(s):
     anom               err             ice             sst       
 Min.   :-10.160   Min.   :0.110   Min.   :0.010   Min.   :-1.80  
 1st Qu.: -0.580   1st Qu.:0.160   1st Qu.:0.470   1st Qu.:-0.03  
 Median : -0.080   Median :0.270   Median :0.920   Median :13.65  
 Mean   : -0.186   Mean   :0.263   Mean   :0.718   Mean   :12.99  
 3rd Qu.:  0.210   3rd Qu.:0.320   3rd Qu.:0.960   3rd Qu.:24.81  
 Max.   :  2.990   Max.   :0.840   Max.   :1.000   Max.   :32.97  
 NA's   :4448      NA's   :4448    NA's   :13266   NA's   :4448   
dimension(s):
     from  to offset delta refsys point values
lon     1 180      0     2     NA    NA   NULL
lat     1  90    -89     2     NA    NA   NULL
zlev    1   1      0   NaN     NA    NA   NULL
time    1   1   1460   NaN     NA    NA   NULL

the coordinates are named lon and lat. This makes perfect sense from a NetCDF point of view (where dimensions are named, although not standardized: lon may also be long or longitude), but in GDAL they are not named (though function have names with "x", "y" and "band").

Following GDAL (and sp where coordinate dimensions names are not well implemented, and simple features where they are hard coded named X Y Z M) in stars I made the design decision that x and y raster coordinates are named x and y. Objects like above are valid, but won't plot and won't integrate with sf objects without changing the coordinate names. I don't find that terribly elegant, but do you, @mdsumner see another way out? An alternative would e.g. be the first dimension is always (assumed to be) x, the second (assumed to be) y. Right now it is arbitrary which dimension numbers are x and y.

mdsumner commented 6 years ago

I don't see why names matter, with a plotting default it seems fair to go by position.

edzer commented 6 years ago

I agree, but stars object don't have to be (sets of) raster maps, they can equally well be time series for regions, or OD matrices. If we'd drop the name requirement, how can one do a sanity check that a stars object has spatially raster dimensions?

And: if names don't matter, we can also choose them and stick with the choice.

mdsumner commented 6 years ago

I'm expecting that a gt be derived from the pair of nominated dimensions from offset and delta - in that sense having gt and offset/delta is redundant, but I see the gt is baked in as well as the names.

edzer commented 6 years ago

Yes, and gt can have non-zero values at pos 3 and 5, indicating sheared grids; this further complicates the conceptual models because you now need row and column to compute either x or y coords, and vice-versa.

mdsumner commented 6 years ago

What about

So then anything from a geotransform-system maintains current behaviour if x, y are to be used as a raster (check if shear params needed). (NetCDF will never have shear terms, curvlinear cases are best treated with coordinates as a variable, so we already good - offset = 0, delta = 1 - the default plot uses index-raster form, and higher level plot helpers deal with irregular grids).

Everything else uses offset and delta as required to build a geotransform when they are to be paired together, otherwise values is used directly.

I need to explore the existing behaviour for non-raster situations ...

edzer commented 6 years ago

I like that! It also gets rid of the repeating geotransform fields, which felt odd.

The case we still haven't covered is curvilinear grids. Maybe:

mdsumner commented 6 years ago

Great! Good idea about values, I'll dig out some examples. A problem I see is that there's no corner coordinates to carry the cell (assuming an AREA grid).But rectilinear has the same problem so might be a case to take POINT seriously - filled intervals between dimension.coordinates, but the coordinates define the boundary (not an implicit centre).

edzer commented 6 years ago

That seems to work; I kept it in a separate branch for now.

edzer commented 6 years ago

In

> read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(10, 12, 1, 1)))
stars object with 4 dimensions and 4 attributes
attribute(s):
     anom                err             ice              sst        
 Min.   :-1.07000   Min.   :0.300   Min.   :0.0100   Min.   :-1.390  
 1st Qu.:-0.36250   1st Qu.:0.300   1st Qu.:0.1100   1st Qu.:-0.720  
 Median : 0.19500   Median :0.300   Median :0.1700   Median :-0.515  
 Mean   : 0.05867   Mean   :0.303   Mean   :0.2094   Mean   :-0.534  
 3rd Qu.: 0.55500   3rd Qu.:0.300   3rd Qu.:0.2550   3rd Qu.:-0.275  
 Max.   : 0.92000   Max.   :0.320   Max.   :0.5200   Max.   : 0.030  
 NA's   :90         NA's   :90      NA's   :104      NA's   :90      
dimension(s):
  from to offset delta refsys point values
1    1 NA      0     2     NA    NA   NULL
2    1 NA    -89     2     NA    NA   NULL
3    1 10      0   NaN     NA    NA   NULL
4    1 10   1460   NaN     NA    NA   NULL
> r = read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(10, 12, 1, 1)))
> plot(r)
Error in plot.stars(r) : 
  no raster, no features geometries: no default plot method set up yet!

do you have an idea based on what we can set the raster dimensions? Is there a limited set of names for longitude and latitude used by netcdf? (I do see that the to fields are NA, which is a regression; otoh I also see that dimensions are unnamed: I think we shouldn't allow that)

adrfantini commented 6 years ago

s there a limited set of names for longitude and latitude used by netcdf?

What do you mean? Dimensions can have any name in the CF. There are contraints in long_name and standard_name but not in variable name iirc.

edzer commented 6 years ago

First of all, the are named, which is good (GDAL doesn't give these, unless you parse the metadata). The question is whether there is a (strict) convention for naming longitude and latitude?

mdsumner commented 6 years ago

No.

edzer commented 6 years ago

This?

adrfantini commented 6 years ago

@edzer that is just an example. Spatial x-y coordinates can be in any name or, for that matter, you could even have multiple grids in the same file with multiple x-y coordinates, even though the conventions advise against this.

Several models I know of use projected coordinates "x" and "y" (or "ix" and "jy"), with additional 2D variables (not dimensions!) "lat" and "lon" (or something similar like "xlat" and "xlon") which indicate the lats and lons of each grid point (yes you could get this from the CRS and the "x" and "y").

edzer commented 6 years ago

OK, so I guess we need

adrfantini commented 6 years ago

Assuming the order is not ideal: CF recommends:

T, then Z, then Y, then X

But it's only a recommendation, not always followed. I would do some guessing but allow the user to override it. raster does some guessing on dimensions which usually works, we could see how they do it. The problem is, in raster it is quite easy: it reads one variable at a time only. stars aims at reading multiple variables which potentially can depend from different dimensions.

edzer commented 6 years ago

Alternatively, we can look into what the GDAL netcdf driver does.