r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
561 stars 94 forks source link

Direct way to go from matrix to stars without raster() or for irregular grids? #82

Closed dblodgett-usgs closed 5 years ago

dblodgett-usgs commented 5 years ago

Doing a little experimentation with stars, I'm running into trouble getting a matrix I created coerced into a stars object that then can go to sf without first using raster(). Using raster is fine for regularly spaced rasters, but what am I missing if I want to do irregularly spaced rectilinear or curvilinear grids?

Using some demo code from one of the blog posts, I'm trying to get a non regularly spaced grid to work and failing... Is this just not implemented yet?

Code rolled up for clarity ```r library(stars) library(raster) m <- matrix(1:20, nrow = 5, ncol = 4) dim(m) <- c(x = 5, y = 4) # named dim x = c(0,0.5,1,2,4,5) y = c(0.3,0.5,1,2,2.2) (r = st_as_stars(list(m = m), dimensions = st_dimensions(x = x, y = y))) st_xy2sfc(r, as_points = FALSE) st_as_sfc(r, as_points = FALSE) st_as_sf(r, as_points = FALSE) image(r) image(x, y, m, col = sf.colors(21)) ras <- raster(m) # works (ras_stars <- st_as_stars(ras)) # dimensions ignored for class raster (ras_stars_ireg <- st_as_stars(ras, dimensions = st_dimensions(x = x, y = y))) # list method expects a matrix? (ras_stars_ireg2 <- st_as_stars(list(ras = ras), dimensions = st_dimensions(x = x, y = y))) st_xy2sfc(ras_stars, as_points = FALSE) st_as_sfc(ras_stars, as_points = FALSE) st_as_sf(ras_stars, as_points = FALSE) ```
edzer commented 5 years ago

Thanks for treading this relatively underexplored part of stars! I noted that giving one more dimension value in dimensions than the number of values in the matrix, as in your example, leads to lots of problems when converting e.g. to sf. Below I removed one, meaning the values are now interpreted as cell center (more or less!). Doing the thing right by specifying raster boundaries as in your example clearly needs some work.

Below some cheap fixes of things that I believe work now (with stars from github):

library(stars)
library(raster)

m <- matrix(1:20, nrow = 5, ncol = 4)
dim(m) <- c(x = 5, y = 4) # named dim
#x = c(0,0.5,1,2,4,5)
x = c(0,0.5,1,2,4)
#y = c(0.3,0.5,1,2,2.2)
y = c(0.3,0.5,1,2)
(r = st_as_stars(list(m = m), 
         dimensions = st_dimensions(x = x, y = y, .raster = c("x", "y"))))

st_xy2sfc(r, as_points = FALSE)
plot(st_as_sfc(r, as_points = FALSE))
plot(st_as_sf(r, as_points = FALSE), axes = TRUE)

image(r)
image(x, y, m, col = sf.colors(21))

ras <- raster(m)

# works
(ras_stars <- st_as_stars(ras))

(ras_stars_ireg <- st_as_stars(ras))
# dimensions ignored for class raster, as raster is always regular
(ras2 = structure(ras_stars_ireg,
            dimensions = st_dimensions(x = x, y = y, .raster = c("x", "y"))))

# list method expects a matrix?
try(ras_stars_ireg2 <- st_as_stars(list(ras = ras), 
    dimensions = st_dimensions(x = x, y = y, .raster = c("x", "y"))))
# yes, this doesn't work, but
(ras_stars_ireg2 <- st_as_stars(list(ras = as.matrix(ras)), 
    dimensions = st_dimensions(x = x, y = y, .raster = c("x", "y"))))
# does

st_xy2sfc(ras_stars, as_points = FALSE)
st_as_sfc(ras_stars, as_points = FALSE)
st_as_sf(ras_stars, as_points = FALSE)
edzer commented 5 years ago

This makes the following code work (note the patch is to both sf and stars):

library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0

m <- matrix(1:20, nrow = 5, ncol = 4)
dim(m) <- c(x = 5, y = 4) # named dim
x = c(0,0.5,1,2,4,5)
y = c(0.3,0.5,1,2,2.2)
(r = st_as_stars(list(m = m), 
                dimensions = st_dimensions(x = x, y = y, .raster = c("x", "y"))))
# stars object with 2 dimensions and 1 attribute
# attribute(s):
#        m        
#  Min.   : 1.00  
#  1st Qu.: 5.75  
#  Median :10.50  
#  Mean   :10.50  
#  3rd Qu.:15.25  
#  Max.   :20.00  
# dimension(s):
#   from to offset delta refsys point        values    
# x    1  6     NA    NA     NA    NA     0, ..., 5 [x]
# y    1  5     NA    NA     NA    NA 0.3, ..., 2.2 [y]

dim(r)
# x y 
# 5 4 
dim(st_dimensions(r))
# x y 
# 6 5 

plot(st_as_sf(r, as_points = FALSE), axes = TRUE)
plot(st_as_sf(r, as_points = TRUE), axes = TRUE)

@mdsumner : note the differences in output from dim -- we don't have a flag in the dimensions object that tells whether values are raster cell centers or raster cell boundaries (of which we have dim(r)+1). This leads to a somewhat confusing to field, when arguing from the dimensions of m. Do you have a suggestion how to make this simpler to users? Decrease to with one, so that we can argue from there? Maybe: decrease to with one so that to - from + 1 smaller than length(values) indicates values must be cell boundaries?

dblodgett-usgs commented 5 years ago

Awesome, thanks @edzer. Will look at replacing my use of Raster here: https://github.com/dblodgett-usgs/intersectr/blob/master/R/create_cell_geometry.R#L106

Is there anything you need re: testing or documentation I could contribute? Just point me to where I should look.

mdsumner commented 5 years ago

Using image() structure would make for some good examples. I can't do anything on this right now, but reading with interest! @dblodgett-usgs any examples you have, from files or otherwise will be helpful. I hadn't thought of users constructing directly from matrices, that is a good way to get diverse examples from various NetCDF files and elsewhere. It's definitely not a simple landscape

dblodgett-usgs commented 5 years ago

I'll have a slew of examples if we can make this work! I have to admit I'm a bit over my head with grid-topologies and the matrix row/col x/y semantics being thrown around here. On that, @edzer -- in your example you treat rows like x and cols like y. In my work in NetCDF-land, rows correspond to the y coordinate variable and columns correspond to x. Feel like I'm constantly screwing that up though so happy to be corrected here.

The example you give above works but if I try to apply my own data to this pattern I'm getting errors.

My little test is not an irregularly spaced grid, but this should still work right? To make the test totally parallel to your sample, I pulled out the x and y cell count into the values 78 and 54 the vectors are 79 and 55 long. If I reduce the vectors to the same size as the matrix, I can get it to work.

Code rolled up for clarity ```r library(stars) m <- matrix(as.numeric(seq(1, (78 * 54))), nrow = 78, ncol = 54) dim(m) <- c(x = 78, y = 54) x = c(1986250, 1987250, 1988250, 1989250, 1990250, 1991250, 1992250, 1993250, 1994250, 1995250, 1996250, 1997250, 1998250, 1999250, 2000250, 2001250, 2002250, 2003250, 2004250, 2005250, 2006250, 2007250, 2008250, 2009250, 2010250, 2011250, 2012250, 2013250, 2014250, 2015250, 2016250, 2017250, 2018250, 2019250, 2020250, 2021250, 2022250, 2023250, 2024250, 2025250, 2026250, 2027250, 2028250, 2029250, 2030250, 2031250, 2032250, 2033250, 2034250, 2035250, 2036250, 2037250, 2038250, 2039250, 2040250, 2041250, 2042250, 2043250, 2044250, 2045250, 2046250, 2047250, 2048250, 2049250, 2050250, 2051250, 2052250, 2053250, 2054250, 2055250, 2056250, 2057250, 2058250, 2059250, 2060250, 2061250, 2062250, 2063250, 2064250) y <- c(-503500, -504500, -505500, -506500, -507500, -508500, -509500, -510500, -511500, -512500, -513500, -514500, -515500, -516500, -517500, -518500, -519500, -520500, -521500, -522500, -523500, -524500, -525500, -526500, -527500, -528500, -529500, -530500, -531500, -532500, -533500, -534500, -535500, -536500, -537500, -538500, -539500, -540500, -541500, -542500, -543500, -544500, -545500, -546500, -547500, -548500, -549500, -550500, -551500, -552500, -553500, -554500, -555500, -556500, -557500) (r = st_as_stars(list(m = m), dimensions = st_dimensions(x = x, y = y, .raster = c("x", "y")))) dim(r) dim(st_dimensions(r)) plot(st_as_sf(r, as_points = FALSE), axes = TRUE) # Error in gdal_write(x, file = file, driver = driver, geotransform = geotransform) : # dimensions don't match plot(st_as_sf(r, as_points = TRUE), axes = TRUE) ```

So... ?

p.s. I was also seeing:

st_dimensions(x = x, y = y, .raster = c("x", "y"))
# Error in st_dimensions.array(x = x, y = y, .raster = c("x", "y")) : 
#   argument ".x" is missing, with no default

realizing the issue was with the array method, did this and it works: st_dimensions(x = as.numeric(x), y = as.numeric(y), .raster = c("x", "y"))

🙏

dblodgett-usgs commented 5 years ago

One other nit...

sf <- st_as_sf(r, as_points = FALSE)
sf::st_crs(sf) <- st_crs("+proj=lcc +lat_1=25 +lat_2=60 +x_0=0 +y_0=0 +units=m +lat_0=42.5 +lon_0=-100 +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs")
# OGR: Corrupt data
# Error in CPL_crs_equivalent(e1$proj4string, e2$proj4string) : OGR error

the crs argument to st_as_stars isn't taking and I can't seem to set the crs of my result. It appears to be something I'm doing wrong though because on the little 5X4 example, I can set the crs with st_crs() fine.

edzer commented 5 years ago

The above works, now, pending the last plot which should be

plot(st_as_sf(r, as_points = TRUE, merge = FALSE), axes = TRUE)

(or requires GDAL 2.4.0, to return a contour/polygons map) The error you saw with

# Error in st_dimensions.array(x = x, y = y, .raster = c("x", "y")) : 

was caused by x being an array, rather than a vector. The st_crs should now also work.

There is still one issue when st_dimensions is fed with cell centers: the half cell shift of origin is not applied, currently.

edzer commented 5 years ago

With an option for the half cell size correction we now see this:

```r library(stars) # Loading required package: abind # Loading required package: sf # Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0 m <- matrix(1:12, nrow = 4, ncol = 3) # same number of values as rows/cols x = c(0, 1, 2, 3) y = c(2, 1, 0) (r = st_as_stars(list(m = m), dimensions = st_dimensions(x = x, y = y, .raster = c("x", "y"), cell_midpoints = FALSE))) # stars object with 2 dimensions and 1 attribute # attribute(s): # m # Min. : 1.00 # 1st Qu.: 3.75 # Median : 6.50 # Mean : 6.50 # 3rd Qu.: 9.25 # Max. :12.00 # dimension(s): # from to offset delta refsys point values # x 1 4 0 1 NA NA NULL [x] # y 1 3 2 -1 NA NA NULL [y] (r = st_as_stars(list(m = m), dimensions = st_dimensions(x = x, y = y, .raster = c("x", "y"), cell_midpoints = TRUE))) # stars object with 2 dimensions and 1 attribute # attribute(s): # m # Min. : 1.00 # 1st Qu.: 3.75 # Median : 6.50 # Mean : 6.50 # 3rd Qu.: 9.25 # Max. :12.00 # dimension(s): # from to offset delta refsys point values # x 1 4 -0.5 1 NA NA NULL [x] # y 1 3 2.5 -1 NA NA NULL [y] # cell corners: x = c(0, 1, 2, 3, 4) y = c(2, 1, 0, -1) (r = st_as_stars(list(m = m), dimensions = st_dimensions(x = x, y = y, .raster = c("x", "y"), cell_midpoints = FALSE))) # stars object with 2 dimensions and 1 attribute # attribute(s): # m # Min. : 1.00 # 1st Qu.: 3.75 # Median : 6.50 # Mean : 6.50 # 3rd Qu.: 9.25 # Max. :12.00 # dimension(s): # from to offset delta refsys point values # x 1 4 0 1 NA NA NULL [x] # y 1 3 2 -1 NA NA NULL [y] # the following should error, but doesn't, as st_dimensions doesn't know m's dimension: (r = st_as_stars(list(m = m), dimensions = st_dimensions(x = x, y = y, .raster = c("x", "y"), cell_midpoints = TRUE))) # stars object with 2 dimensions and 1 attribute # attribute(s): # m # Min. : 1.00 # 1st Qu.: 3.75 # Median : 6.50 # Mean : 6.50 # 3rd Qu.: 9.25 # Max. :12.00 # dimension(s): # from to offset delta refsys point values # x 1 4 -0.5 1 NA NA NULL [x] # y 1 3 2.5 -1 NA NA NULL [y] ```

which I do not consider optimal, but at least documented.

@dblodgett-usgs : with respect to the rows=x and cols=y magic, yes, this takes a short while to get your head around, but once you do it's quite liberating: index and dimension order simply correspond to x and y. If you have your matrix with rows=y and cols=x, maybe define dimensions as y, x rather than x, y, and then use aperm to reorder (if needed).

dblodgett-usgs commented 5 years ago

So great, @edzer! Nice use of HTML5 <details> too, I didn't know about that.

I think I got this all figured out over here: https://github.com/dblodgett-usgs/intersectr/pull/1

The half cell issue has caused more problems over the years... it should almost be a required input that's categorical "center", "leading", or "trailing". Otherwise, someone will screw it up...

The whole x/y thing drives me kind of nuts. Over in intersectr I'm calling inputs "col_coords" and "row_coords" instead of x and y now but am leaning toward something more like "nc_col_coords" and "nc_row_coords" to be super explicit about it. Thanks for the tip on aperm, I'll keep that in mind.

I'll contribute some testing for this tomorrow and close this issue with a PR.

edzer commented 5 years ago

I like the idea of not having a default for cell_midpoints, and by that forcing users to be explicit. I never came across "trailing" as an option in real life data, did you? (or are you thinking north-up / row-down y axes?)

dblodgett-usgs commented 5 years ago

No, I've not seen "trailing" either, just mentioned it because it felt complete. I think we could safely leave it off.

dblodgett-usgs commented 5 years ago

Side note on this... I found a case where one dimension passes as "regularly spaced" and another does not. I think this is due to noise in the lat/lon coordinate variable from a NetCDF file (?). If I redistribute the coordinates to be evenly spaced things work.

Is there a tolerance for considering something "regularly spaced"? Would that be something to put into an overridable input?

edzer commented 5 years ago

Yes, see here. It makes sense to pass the epsilon parameter further up the call stack.