Closed edzer closed 6 years ago
I don't see why names matter, with a plotting default it seems fair to go by position.
I agree, but stars object don't have to be (sets of) raster maps, they can equally well be time series for regions, or OD matrices. If we'd drop the name requirement, how can one do a sanity check that a stars object has spatially raster dimensions?
And: if names don't matter, we can also choose them and stick with the choice.
I'm expecting that a gt be derived from the pair of nominated dimensions from offset and delta - in that sense having gt and offset/delta is redundant, but I see the gt is baked in as well as the names.
Yes, and gt can have non-zero values at pos 3 and 5, indicating sheared grids; this further complicates the conceptual models because you now need row and column to compute either x or y coords, and vice-versa.
What about
dimensions
stores the record of an input gt, stores the dimension names (x, y) and the shear components (ys = 0, xs = 0 usually)values
is regular dimension stores delta
, and values is not storedvalues
is stored (it's rectlinear) then delta = NA
So then anything from a geotransform-system maintains current behaviour if x, y are to be used as a raster (check if shear params needed). (NetCDF will never have shear terms, curvlinear cases are best treated with coordinates as a variable, so we already good - offset = 0, delta = 1 - the default plot uses index-raster form, and higher level plot helpers deal with irregular grids).
Everything else uses offset and delta as required to build a geotransform when they are to be paired together, otherwise values
is used directly.
I need to explore the existing behaviour for non-raster situations ...
I like that! It also gets rid of the repeating geotransform fields, which felt odd.
The case we still haven't covered is curvilinear grids. Maybe:
curvilinear
is TRUE
lon
values, matrix for lat
values) in values
?Great! Good idea about values, I'll dig out some examples. A problem I see is that there's no corner coordinates to carry the cell (assuming an AREA grid).But rectilinear has the same problem so might be a case to take POINT seriously - filled intervals between dimension.coordinates, but the coordinates define the boundary (not an implicit centre).
That seems to work; I kept it in a separate branch for now.
In
> read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(10, 12, 1, 1)))
stars object with 4 dimensions and 4 attributes
attribute(s):
anom err ice sst
Min. :-1.07000 Min. :0.300 Min. :0.0100 Min. :-1.390
1st Qu.:-0.36250 1st Qu.:0.300 1st Qu.:0.1100 1st Qu.:-0.720
Median : 0.19500 Median :0.300 Median :0.1700 Median :-0.515
Mean : 0.05867 Mean :0.303 Mean :0.2094 Mean :-0.534
3rd Qu.: 0.55500 3rd Qu.:0.300 3rd Qu.:0.2550 3rd Qu.:-0.275
Max. : 0.92000 Max. :0.320 Max. :0.5200 Max. : 0.030
NA's :90 NA's :90 NA's :104 NA's :90
dimension(s):
from to offset delta refsys point values
1 1 NA 0 2 NA NA NULL
2 1 NA -89 2 NA NA NULL
3 1 10 0 NaN NA NA NULL
4 1 10 1460 NaN NA NA NULL
> r = read_ncdf(f, ncsub = cbind(start = c(1, 1, 1, 1), count = c(10, 12, 1, 1)))
> plot(r)
Error in plot.stars(r) :
no raster, no features geometries: no default plot method set up yet!
do you have an idea based on what we can set the raster dimensions? Is there a limited set of names for longitude and latitude used by netcdf? (I do see that the to
fields are NA
, which is a regression; otoh I also see that dimensions are unnamed: I think we shouldn't allow that)
s there a limited set of names for longitude and latitude used by netcdf?
What do you mean? Dimensions can have any name in the CF. There are contraints in long_name
and standard_name
but not in variable name iirc.
First of all, the are named, which is good (GDAL doesn't give these, unless you parse the metadata). The question is whether there is a (strict) convention for naming longitude and latitude?
No.
@edzer that is just an example. Spatial x-y coordinates can be in any name or, for that matter, you could even have multiple grids in the same file with multiple x-y coordinates, even though the conventions advise against this.
Several models I know of use projected coordinates "x" and "y" (or "ix" and "jy"), with additional 2D variables (not dimensions!) "lat" and "lon" (or something similar like "xlat" and "xlon") which indicate the lats and lons of each grid point (yes you could get this from the CRS and the "x" and "y").
OK, so I guess we need
x %>% st_set_raster(c("long", "lat"))
which then specifies these two vars as x and y, andAssuming the order is not ideal: CF recommends:
T, then Z, then Y, then X
But it's only a recommendation, not always followed. I would do some guessing but allow the user to override it. raster
does some guessing on dimensions which usually works, we could see how they do it. The problem is, in raster
it is quite easy: it reads one variable at a time only. stars
aims at reading multiple variables which potentially can depend from different dimensions.
Alternatively, we can look into what the GDAL netcdf driver does.
@mdsumner I noted that in
the coordinates are named
lon
andlat
. This makes perfect sense from a NetCDF point of view (where dimensions are named, although not standardized:lon
may also belong
orlongitude
), but in GDAL they are not named (though function have names with "x", "y" and "band").Following GDAL (and sp where coordinate dimensions names are not well implemented, and simple features where they are hard coded named X Y Z M) in stars I made the design decision that x and y raster coordinates are named x and y. Objects like above are valid, but won't plot and won't integrate with sf objects without changing the coordinate names. I don't find that terribly elegant, but do you, @mdsumner see another way out? An alternative would e.g. be the first dimension is always (assumed to be) x, the second (assumed to be) y. Right now it is arbitrary which dimension numbers are x and y.