r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.34k stars 298 forks source link

Finding neighboring polygons #234

Closed tiernanmartin closed 7 years ago

tiernanmartin commented 7 years ago

I use spdep::poly2nb to identify neighboring polygons, but this necessitates a conversion back to sp - is there a way to do this without that conversion?

edzer commented 7 years ago

st_intersects gives the sparse matrix; you'd have to filter out the diagonal elements, and set the class attribute.

tiernanmartin commented 7 years ago

Disclaimer: If Stackoverflow is the preferred venue for this sort of question please let me know and I'll post there.

Below I have a reprex showing the type of polygon neighbor calculation I do regularly. The blue county has several intersecting neighbors which are identified using sf::st_intersects, but the southeastern-most county (light brown in color) only shares a single point with the target county and I may want to exclude it; sp::spdep has the queen = FALSE argument which allows me to easily exclude such counties.

Could someone demonstrate a way to do this using sf-verbs? If no simple method for filtering the diagonal elements exists, could one be added?

library(sp)
library(spdep)
library(tidyverse)
library(sf)

demo(nc, ask = FALSE, verbose = FALSE)

par(mfrow = c(3, 1), mar = c(rep(0.75, 4)))

# Target county
plot(nc[1], col = "red", main = "Target county")
plot(nc[50, "AREA"], add = TRUE)

# Using sf verbs
spare_mtx <- st_intersects(nc, nc)
#> although coordinates are longitude/latitude, it is assumed that they are planar

plot(nc[1], col = "red", main = "Neighbors: st_intersection")
plot(nc[spare_mtx[[50]], "NAME"], add = TRUE)
plot(nc[50, "AREA"], add = TRUE)

# Using the 'rook' method from spdep::poly2nb
spare_mtx_rook <- poly2nb(as(nc, "Spatial"), queen = FALSE)

plot(nc[1], col = "red", main = "Neighbors: poly2nb")
plot(nc[spare_mtx_rook[[50]], "NAME"], add = TRUE)
plot(nc[50, "AREA"], add = TRUE)

edzer commented 7 years ago

Look into st_relate, it gives you the dimension of the intersection of the boundaries of two geometries; this is 1 for a line, 0 for a point.

(It does give you a dense matrix, though; in case memory is a problem you could do st_intersects first, then loop through each nbh set with st_relate.)

tiernanmartin commented 7 years ago

Got it.

st_relate does return the result I needed but I would suggest that there's still room for a simpler, high-level verb to do this.

Using the example above, I loop through the result of st_intersects which returns a list of matrices containing DE-9IM Matrix strings (which I had never seen before and take a bit of work to understand):

spare_mtx <- st_intersects(nc, nc)

spare_mtx[[50]]
#> [1] 39 40 42 50 69 70 71

spare_mtx[[50]] %>% 
map(~nc[.x, ]) %>% 
map(~st_relate(x = nc[50, ], y = .x))

#> [[1]]
#>      [,1]       
#> [1,] "FF2F11212"
#> 
#> [[2]]
#>      [,1]       
#> [1,] "FF2F11212"
#> 
#> [[3]]
#>      [,1]       
#> [1,] "FF2F11212"
#> 
#> [[4]]
#>      [,1]       
#> [1,] "2FFF1FFF2"
#> 
#> [[5]]
#>      [,1]       
#> [1,] "FF2F11212"
#> 
#> [[6]]
#>      [,1]       
#> [1,] "FF2F01212"     <- Note: this is the diagonal neighbor county from the example above
#> 
#> [[7]]
#>      [,1]       
#> [1,] "FF2F11212"

I think a convenience function similar to poly2nb would be a worthwhile addition to the sf canon.

tiernanmartin commented 7 years ago

On a related note, I'm a bit puzzled by the following error:


library(tidyverse)
library(sf)
library(purrr)

demo(nc, ask = FALSE, verbose = FALSE)

# Loop over nc$geom with st_intersects() and save the results in a new list-col: nc$INTERSECT

nc %>% mutate(INTERSECT = map(.x = geom, .f = st_intersects, y = st_geometry(nc)))
#> Error in mutate_impl(.data, dots): st_crs(x) == st_crs(y) is not TRUE

In this example, it appears that the classes of the two objects are different:

#> Browse[2]> map(list(x,y),class)
#> [[1]]
#> [1] "XY"           "MULTIPOLYGON" "sfg"
#> 
#> [[2]]
#> [1] "sfc_MULTIPOLYGON" "sfc"

... and therefore x lacks crs metadata:

#> Browse[2]> map(list(x,y),st_crs)
#> [[1]]
#> $epsg
#> [1] NA
#> 
#> $proj4string
#> [1] NA
#> 
#> attr(,"class")
#> [1] "crs"
#> 
#> [[2]]
#> $epsg
#> [1] 4267
#> 
#> $proj4string
#> [1] "+proj=longlat +datum=NAD27 +no_defs"
#> 
#> attr(,"class")
#> [1] "crs"

I find it confusing (and inconvenient) that a sfc passed to map results in a sfg, while st_geometry(nc) returns a sfc. In the example above this class mismatch causes st_intersects to return an error.

It seems like a vignette demonstrating the recommended way of manipulating sfcs within a piped workflow could help resolve this type of question.

Edit: For instance, this is how I solved the problem above but I have no idea if my approach uses these tools in the way they're intended to be used - some guidance would go a long way here:


library(tidyverse)
library(stringr)
library(sf)
library(purrr)

demo(nc, ask = FALSE, verbose = FALSE)

nc %>% 
  select(NAME) %>% 
  mutate(NB = st_intersects(geom, geom)) %>% 
  mutate(NB_GEOMS = map(NB, ~st_geometry(nc[.x, ]))) %>% 
  mutate(GEOM_LST = map(geom, ~st_geometry(.x) %>% st_set_crs(st_crs(nc)))) %>% 
  mutate(RELATE = map2(.x = NB_GEOMS, .y = GEOM_LST, .f = ~st_relate(.y, .x))) %>% 
  mutate(RELATE_LGL = map(RELATE, str_detect, pattern = "^[[:alnum:]]{4}1")) %>% 
  mutate(NB_ROOK = map2(.x = NB, .y = RELATE_LGL, ~.x[.y])) %>% 
  select(NAME, NB, NB_ROOK, dplyr::everything()) %>% 
  slice(50) %>%    # Rowan County is the one highlighted in the example above
  as_tibble()
#> # A tibble: 1 × 8
#>     NAME        NB   NB_ROOK         NB_GEOMS         GEOM_LST
#>   <fctr>    <list>    <list>           <list>           <list>
#> 1  Rowan <int [7]> <int [6]> <simple_feature> <simple_feature>
#> # ... with 3 more variables: RELATE <list>, RELATE_LGL <list>,
#> #   geom <simple_feature>
edzer commented 7 years ago

Your example above

> nc %>% mutate(INTERSECT = map(.x = geom, .f = st_intersects, y = st_geometry(nc))) %>% print(n=2)
Simple feature collection with 100 features and 15 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID):    4267
proj4string:    +proj=longlat +datum=NAD27 +no_defs
First 2 features:
   AREA PERIMETER CNTY_ CNTY_ID      NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
1 0.114     1.442  1825    1825      Ashe 37009  37009        5  1091     1
2 0.061     1.231  1827    1827 Alleghany 37005  37005        3   487     0
  NWBIR74 BIR79 SID79 NWBIR79    INTERSECT                           geom
1      10  1364     0      19 1, 2, 18, 19 MULTIPOLYGON(((-81.47275543...
2      10   542     3      12  1, 2, 3, 18 MULTIPOLYGON(((-81.23989105...

now works, but the simpler way to do this is

nc %>% mutate(INTERSECT = st_intersects(.))

I am not so sure I can follow your misunderstanding of purrr::map; although I don't think its current documentation explains well what map does, I believe it basically redoes lapply: apply a function to elements of a list, return a list with return values. Since list elements of an sfc are of class sfg (see first vignette), this is to be expected. What had you expected it to do?

tiernanmartin commented 7 years ago

Thanks for adding d9d0af3 🎉

Regarding your question: I believe that my mistake was that I should have used map2(.x = geom, .y = st_geometry(nc), .f = st_intersects) so that both sets of elements passed to st_intersects() were class sfg. In retrospect, that example is pretty confusing and it is unclear in the post that I was using debug() in the second part, so thanks for even giving any thought at all.

tiernanmartin commented 7 years ago

I'd like to know your thoughts (as the package developer) on the approach I demonstrated in the edit - is this the way you intend for users to step through the creation of a spatial analysis variable (such as identifying nearest neighbors)? Or do you think it's more suitable for users to compose functions that do this sort of task?

edzer commented 7 years ago

With admittedly limited knowledge of spdep, I have never seen someone create geometries with the actual neighbours, as you seem to do in NB_GEOMS, rather than indices of the actual neighbours as in my example above. Why would you do so?

tiernanmartin commented 7 years ago

You certainly could use the indices - it's probably more computationally efficient to do it that way.

But that is not the focus of my question. My question is a bit more general: how should an sf user approach a multi-step spatial operation like this?

Using the existing sf tools, finding "rook" neighbors requires at least two steps: st_intersects() then st_relate(). I demonstrated an approach that makes heavy use of dplyr::mutate() and purrr::map*(), saving intervening variables in columns. I suspect this type of operation might also be approached by writing a single function that performs all of the steps. But maybe you have a different approach altogether?

I think that relatively new R-users like myself who are starting to tackle spatial problems (or spatial datasets) would benefit from a vignette demonstrating how to go about a multi-step spatial operation using sf tools.

It looks like you've started a vignette to-do list - perhaps you might consider adding this request to your list. Thanks!

edzer commented 7 years ago

I now understand it - sorry, took a while. What I think we need is to add a pattern variable to st_relate, just like rgeos::gRelate, which lets you select neighbours based on a pre-defined pattern, and in that case returns the indices list of st_intersect. You could then trivially write a function st_rook. Will look into.

edzer commented 7 years ago

Pls test; you should now get rook neighbors with

st_rook = function(a, b = a) st_relate(a, b, pattern = "F***1****")
nc %>% mutate(NB_ROOK = st_rook(.))
walkerke commented 7 years ago

This is fantastic! I've tested and it works. By extension, to get queen's case neighbors as well you can do:

st_queen <- function(a, b = a) st_relate(a, b, pattern = "F***T****")
nc %>% mutate(NB_QUEEN = st_queen(.))

which constitutes an improvement on using st_intersects for that purpose as it eliminates self-intersections.

For anyone coming across this, the Wikipedia page for DE-9IM strings is quite helpful: https://en.wikipedia.org/wiki/DE-9IM.

edzer commented 7 years ago

Thanks @walkerke , I've added this to the documentation of st_relate_pattern.

becarioprecario commented 7 years ago

Hi,

I am creating a function, st2nb, to compute adjacencies from sf-objects using the the code in poly2nb() using all st_rook() and st_queen(), as explained in earlier posts in this thread. However, poly2nb() takes an additional argument called snap:

snap: boundary points less than ‘snap’ distance apart are considered to indicate contiguity

Is there any way to include this in the call to st_relate?

Thanks!

Virgilio

edzer commented 7 years ago

I can think of two solutions:

  1. you can use st_distance to compute distances, and compare these with snap; this however gives you a dense matrix,
  2. set the precision, which should round away (most?) differences smaller than snap.
becarioprecario commented 7 years ago

Hi Edzer,

I can think of two solutions:

• you can use st_distance to compute distances, and compare these with snap; this however gives you a dense matrix, • set the precision, which should round away (most?) differences smaller than snap.

Thanks for your comments. Setting the precision seems to have funny side and st_distance seems to only work point geometries. It would be doable with this but it really involves more work than using st_queen or st_rook… So I think that I will keep a warning for now. :)

Best,

Virgilio

rsbivand commented 7 years ago

Somewhere I should have an object that needed non-default snap that might help. I'll try to find it next week - may we talk about this at useR!?

becarioprecario commented 7 years ago

Hi Roger,

Somewhere I should have an object that needed non-default snap that might help. I'll try to find it next week - may we talk about this at useR!?

Thanks! That would be very useful. I think that there is always the possibility of rewriting the existing code to deal with sf-objects instead of using st_relate, but this will require more work, I think.

Best,

Virgilio

edzer commented 7 years ago
> p = st_polygon(list(rbind(c(0,0),c(1,1),c(0,1),c(0,0))))
> st_distance(p, p + c(2,2))
         [,1]
[1,] 1.414214
becarioprecario commented 7 years ago

Hi

El 22 jun 2017, a las 14:30, Edzer Pebesma notifications@github.com escribió:

p = st_polygon(list(rbind(c(0,0),c(1,1),c(0,1),c(0,0)))) st_distance(p, p + c(2,2)) [,1]

Thanks for the example. In poly2nb(), the bounding boxes of two possible neighbors are checked first to see if they overlap. If they do, then the boundaries are actually checked to see if they are neighbors or not. So, there is no need to compute all pairs of distances. But I thought that perhaps with simple features there were other (simpler) options using some of the st_XXX functions.

Another option would be to add a buffer of snap/2 around each polygon and then use st_relate, right? Just thinking…

Best,

Virgilio

edzer commented 7 years ago

If they do not overlap, they can be still snap distance apart... The st_buffer approach sounds good, if you insist on supporting snap.

Robinlovelace commented 7 years ago

I found this approach worked in a use case from sp, as documented in #210. The question is a) does this generalise to other instances where the dimension of the relation is important and b) how to make the code more user-friendly and concise? If there's no quickfire answer I can open a new issue but just wanted to check I'm not missing something obvious and easy to teach first (from an R for spatial data course where people are asking about this!):

library(sp)
g = SpatialGrid(GridTopology(c(0, 0), c(1, 1), c(3, 3)))
p = as(g, "SpatialPolygons")
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.2.1, proj.4 4.9.2, lwgeom 2.3.3 r15473
library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
p_sf = st_as_sf(p)
st_rook = function(a, b = a) st_relate(a, b, pattern = "F***1****")
st_queen <- function(a, b = a) st_relate(a, b, pattern = "F***T****")
p_queen = p_sf %>% mutate(NB_ROOK = st_queen(., b = p_sf[5, ])) %>% filter(NB_ROOK == 
  1)
nrow(p_queen)
#> [1] 8
p_rook = p_sf %>% mutate(NB_ROOK = st_rook(., b = p_sf[5, ])) %>% filter(NB_ROOK == 
  1)
nrow(p_rook)
#> [1] 4
p_minDim2 = p_sf %>% filter(st_within(., y = p_sf[5, ], sparse = FALSE))
nrow(p_minDim2)
#> [1] 1
rsbivand commented 7 years ago

Notes so far on spdep's R-forge SVN site. This is very much work in progress, as poly2nb is pre-SF (the definition) and thus processes invalid geometries without protesting (and is much faster, even using very primitive scanning for potential neighbours). The same holds for trying st_buffer() to find distance neighbours - the vignette will have timings.

Edzer has added a class ("sgbp" - sparse geometry binary predicate), and row.names, and what predicate made the object attributes to lists returned by sf_relate and friends; this will make it much easier to use sf objects inside poly2nb, dnearneigh, knearneigh, and the graph neighbours, as well as helper functions and methods.

Because spdep (on R-forge) now suggests sf (and may import from), it has stopped checking and building on R-forge (but rgdal and rgeos build and check OK??). The report:

Quitting from lines 35-38 (nb_sf.Rmd) 
Error: processing vignette 'nb_sf.Rmd' failed with diagnostics:
there is no package called 'sf'

But you can download the Rmd file, or checkout SVN.

edzer commented 7 years ago

Does r-forge have GDAL >= 2.0 ?

rsbivand commented 7 years ago

Well caught in:

checking gdal-config usability... yes
configure: GDAL: 1.10.1
checking GDAL version >= 1.6.3... yes

Request for support on R-Forge to upgrade at least GDAL sent.

edzer commented 7 years ago

If you migrate spdep to github, I will set up travis & appveyor for it.

Robinlovelace commented 7 years ago

Reply appreciated to questions a) and b) above @edzer. Cheers.

edzer commented 7 years ago

I praise myself lucky that I no longer need to understand the code I wrote for sp::over.

Robinlovelace commented 7 years ago

So no answer?

rsbivand commented 7 years ago

I'll chew over transitioning spdep to github; how to keep the full SVN commit history (what changed to which file when)? I sometimes need to know when a change was committed to compare with email exchanges connected to the revision.

edzer commented 7 years ago

@Robinlovelace maybe not the long answer you'd hoped for. sp::over in your case calls rgeos::overGeomGeom; as you can see that uses gRelate if minDimension was set; gRelate is called st_relate in sf. You can work from there, or from the respective documentations of these functions. As you said in the other issue: time and perseverance is all that is required.

Robinlovelace commented 7 years ago

That sounds good. Time + perseverance can indeed resolve most issues. The fact that both sf and sp use the general relate operator suggests it should be relatively straightforward to implement. Is that something you'd be interested in? If so I could try to cobble together a PR to implement it. If not I'll still try to implement something, not least for my own benefit in terms of getting my head-round DE-9IM.

edzer commented 7 years ago

Personally no; if you think it strongly benefits many users, you can try to make that point.

Robinlovelace commented 7 years ago

Sounds reasonable. Will take a look this weekend after the course today. A day in Leeds teaching sf just completed - went well.

rsbivand commented 7 years ago

Wrt. spdep -> github: would it make more sense just to modernise spweights, last seen over 15 years ago? That is, un-archive an existing package name and do the weights development there? Of course, spdep would import from or depend on spweights ...

edzer commented 7 years ago

If spweights would have tests involving sf, the dependency would be there anyway, but second order. I don't care where spdep resides, but many will find it more attractive to contribute to it when it is on github.

rsbivand commented 7 years ago

They are free to contribute on SVN, but nobody (except Gianfranco) did within the repo, even when it was the platform of choice. It's a bit like gstat, lots of functionality in many different directions, so no clear pattern now. It started as spweights for nb and listw objects, sptests for tests, and spdep for spatial regression modelling, but got merged in the pre-namespace period, when there was no imports from.

What is the transfer protocol?

nuest commented 7 years ago

It is possible to transfer the repository's history and match svn user names to GitHub names. I can help out if you would like to preserve these.

Should the repo become part of r-spatial?

rsbivand commented 7 years ago

Thanks for the offer of help! Is this a reasonable representation of transfer?

I see that gstat is not in r-spatial, so I think I would do the same, keeping r-spatial for shared infrastructure, does that make sense?

edzer commented 7 years ago

r-spatial welcomes modern and actively maintained spatial R packages.

nuest commented 7 years ago

@rsbivand yes, that looks like a complete list of steps. This one I have used and contributed to myself a while back, includes some more steps around tags and branches, and removing large files from history (if any of these are needed).

rsbivand commented 7 years ago

@edzer spdep isn't modern, but is actively maintained. It also invites modularisation and rationalisation, but it isn't clear that modularisation in-place (in r-spatial) is the only choice. For example, I do not use roxygen and am not convinced that it is helpful; I feel further that mechanical code coverage is not better than help page examples showing the relationship between implementation and the published books/articles with their examples. So if modern means roxygen and testthat, spdep isn't modern. If modern is without such constraints, maybe r-spatial is viable.

edzer commented 7 years ago

With modern I was more thinking of modern, state-of-the-art methods, not of coding styles.

rsbivand commented 7 years ago

Aha, then yes, spdep and code/packages depending on spdep are modern ... like the recent Wagner & Zeileis spatial and tree regression two-step package, and others ...

rsbivand commented 7 years ago

@Nowosad : may I upload a newly imported NY8 GPKG to spData? This will let me continue with the neighbour object vignette, and start a transition to importing data from spData into spdep, then spdep to github.

Nowosad commented 7 years ago

@rsbivand Yes, of course. Do you prefer to make a pull request or should I give you a push access to the repository?

rsbivand commented 7 years ago

Thanks, when I get to it, I'll make a PR.

rsbivand commented 7 years ago

The vignette at:

https://r-forge.r-project.org/scm/viewvc.php/pkg/vignettes/nb_sf.Rmd?view=markup&revision=719&root=spdep

and attached now runs, see the summary section.

You'll also need spData from https://github.com/rsbivand/spData; I haven't made a PR yet for this, so it isn't in Jakub Nowosad's repo. Maybe you can just try out the suggestions in the vignette summary without needing the data. I've been using the most recent sf development version.

Please let me know if this is clear, and if it helps.

edzer commented 7 years ago

When installing spdep from r-forge, after installing your spData fork, I'm seeing:

* creating vignettes ... ERROR
Error in CPL_geos_binop(st_geometry(x), st_geometry(y), op, par, pattern,  : 
  Evaluation error: TopologyException: side location conflict at 411447.38362785219 4768469.9111850867.
Quitting from lines 430-433 (nb_sf.Rmd)

any ideas?