Closed tiernanmartin closed 7 years ago
st_intersects
gives the sparse matrix; you'd have to filter out the diagonal elements, and set the class attribute.
Disclaimer: If Stackoverflow is the preferred venue for this sort of question please let me know and I'll post there.
Below I have a reprex showing the type of polygon neighbor calculation I do regularly. The blue county has several intersecting neighbors which are identified using sf::st_intersects
, but the southeastern-most county (light brown in color) only shares a single point with the target county and I may want to exclude it; sp::spdep
has the queen = FALSE
argument which allows me to easily exclude such counties.
Could someone demonstrate a way to do this using sf
-verbs? If no simple method for filtering the diagonal elements exists, could one be added?
library(sp)
library(spdep)
library(tidyverse)
library(sf)
demo(nc, ask = FALSE, verbose = FALSE)
par(mfrow = c(3, 1), mar = c(rep(0.75, 4)))
# Target county
plot(nc[1], col = "red", main = "Target county")
plot(nc[50, "AREA"], add = TRUE)
# Using sf verbs
spare_mtx <- st_intersects(nc, nc)
#> although coordinates are longitude/latitude, it is assumed that they are planar
plot(nc[1], col = "red", main = "Neighbors: st_intersection")
plot(nc[spare_mtx[[50]], "NAME"], add = TRUE)
plot(nc[50, "AREA"], add = TRUE)
# Using the 'rook' method from spdep::poly2nb
spare_mtx_rook <- poly2nb(as(nc, "Spatial"), queen = FALSE)
plot(nc[1], col = "red", main = "Neighbors: poly2nb")
plot(nc[spare_mtx_rook[[50]], "NAME"], add = TRUE)
plot(nc[50, "AREA"], add = TRUE)
Look into st_relate
, it gives you the dimension of the intersection of the boundaries of two geometries; this is 1 for a line, 0 for a point.
(It does give you a dense matrix, though; in case memory is a problem you could do st_intersects
first, then loop through each nbh set with st_relate
.)
Got it.
st_relate
does return the result I needed but I would suggest that there's still room for a simpler, high-level verb to do this.
Using the example above, I loop through the result of st_intersects
which returns a list of matrices containing DE-9IM Matrix strings (which I had never seen before and take a bit of work to understand):
spare_mtx <- st_intersects(nc, nc)
spare_mtx[[50]]
#> [1] 39 40 42 50 69 70 71
spare_mtx[[50]] %>%
map(~nc[.x, ]) %>%
map(~st_relate(x = nc[50, ], y = .x))
#> [[1]]
#> [,1]
#> [1,] "FF2F11212"
#>
#> [[2]]
#> [,1]
#> [1,] "FF2F11212"
#>
#> [[3]]
#> [,1]
#> [1,] "FF2F11212"
#>
#> [[4]]
#> [,1]
#> [1,] "2FFF1FFF2"
#>
#> [[5]]
#> [,1]
#> [1,] "FF2F11212"
#>
#> [[6]]
#> [,1]
#> [1,] "FF2F01212" <- Note: this is the diagonal neighbor county from the example above
#>
#> [[7]]
#> [,1]
#> [1,] "FF2F11212"
I think a convenience function similar to poly2nb
would be a worthwhile addition to the sf
canon.
On a related note, I'm a bit puzzled by the following error:
library(tidyverse)
library(sf)
library(purrr)
demo(nc, ask = FALSE, verbose = FALSE)
# Loop over nc$geom with st_intersects() and save the results in a new list-col: nc$INTERSECT
nc %>% mutate(INTERSECT = map(.x = geom, .f = st_intersects, y = st_geometry(nc)))
#> Error in mutate_impl(.data, dots): st_crs(x) == st_crs(y) is not TRUE
In this example, it appears that the classes of the two objects are different:
#> Browse[2]> map(list(x,y),class)
#> [[1]]
#> [1] "XY" "MULTIPOLYGON" "sfg"
#>
#> [[2]]
#> [1] "sfc_MULTIPOLYGON" "sfc"
... and therefore x
lacks crs
metadata:
#> Browse[2]> map(list(x,y),st_crs)
#> [[1]]
#> $epsg
#> [1] NA
#>
#> $proj4string
#> [1] NA
#>
#> attr(,"class")
#> [1] "crs"
#>
#> [[2]]
#> $epsg
#> [1] 4267
#>
#> $proj4string
#> [1] "+proj=longlat +datum=NAD27 +no_defs"
#>
#> attr(,"class")
#> [1] "crs"
I find it confusing (and inconvenient) that a sfc
passed to map
results in a sfg
, while st_geometry(nc)
returns a sfc
. In the example above this class mismatch causes st_intersects
to return an error.
It seems like a vignette demonstrating the recommended way of manipulating sfc
s within a piped workflow could help resolve this type of question.
Edit: For instance, this is how I solved the problem above but I have no idea if my approach uses these tools in the way they're intended to be used - some guidance would go a long way here:
library(tidyverse)
library(stringr)
library(sf)
library(purrr)
demo(nc, ask = FALSE, verbose = FALSE)
nc %>%
select(NAME) %>%
mutate(NB = st_intersects(geom, geom)) %>%
mutate(NB_GEOMS = map(NB, ~st_geometry(nc[.x, ]))) %>%
mutate(GEOM_LST = map(geom, ~st_geometry(.x) %>% st_set_crs(st_crs(nc)))) %>%
mutate(RELATE = map2(.x = NB_GEOMS, .y = GEOM_LST, .f = ~st_relate(.y, .x))) %>%
mutate(RELATE_LGL = map(RELATE, str_detect, pattern = "^[[:alnum:]]{4}1")) %>%
mutate(NB_ROOK = map2(.x = NB, .y = RELATE_LGL, ~.x[.y])) %>%
select(NAME, NB, NB_ROOK, dplyr::everything()) %>%
slice(50) %>% # Rowan County is the one highlighted in the example above
as_tibble()
#> # A tibble: 1 × 8
#> NAME NB NB_ROOK NB_GEOMS GEOM_LST
#> <fctr> <list> <list> <list> <list>
#> 1 Rowan <int [7]> <int [6]> <simple_feature> <simple_feature>
#> # ... with 3 more variables: RELATE <list>, RELATE_LGL <list>,
#> # geom <simple_feature>
Your example above
> nc %>% mutate(INTERSECT = map(.x = geom, .f = st_intersects, y = st_geometry(nc))) %>% print(n=2)
Simple feature collection with 100 features and 15 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID): 4267
proj4string: +proj=longlat +datum=NAD27 +no_defs
First 2 features:
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 1
2 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487 0
NWBIR74 BIR79 SID79 NWBIR79 INTERSECT geom
1 10 1364 0 19 1, 2, 18, 19 MULTIPOLYGON(((-81.47275543...
2 10 542 3 12 1, 2, 3, 18 MULTIPOLYGON(((-81.23989105...
now works, but the simpler way to do this is
nc %>% mutate(INTERSECT = st_intersects(.))
I am not so sure I can follow your misunderstanding of purrr::map
; although I don't think its current documentation explains well what map
does, I believe it basically redoes lapply
: apply a function to elements of a list, return a list with return values. Since list elements of an sfc
are of class sfg
(see first vignette), this is to be expected. What had you expected it to do?
Thanks for adding d9d0af3 🎉
Regarding your question: I believe that my mistake was that I should have used map2(.x = geom, .y = st_geometry(nc), .f = st_intersects)
so that both sets of elements passed to st_intersects()
were class sfg
. In retrospect, that example is pretty confusing and it is unclear in the post that I was using debug()
in the second part, so thanks for even giving any thought at all.
I'd like to know your thoughts (as the package developer) on the approach I demonstrated in the edit - is this the way you intend for users to step through the creation of a spatial analysis variable (such as identifying nearest neighbors)? Or do you think it's more suitable for users to compose functions that do this sort of task?
With admittedly limited knowledge of spdep
, I have never seen someone create geometries with the actual neighbours, as you seem to do in NB_GEOMS
, rather than indices of the actual neighbours as in my example above. Why would you do so?
You certainly could use the indices - it's probably more computationally efficient to do it that way.
But that is not the focus of my question. My question is a bit more general: how should an sf
user approach a multi-step spatial operation like this?
Using the existing sf
tools, finding "rook" neighbors requires at least two steps: st_intersects()
then st_relate()
. I demonstrated an approach that makes heavy use of dplyr::mutate()
and purrr::map*()
, saving intervening variables in columns. I suspect this type of operation might also be approached by writing a single function that performs all of the steps. But maybe you have a different approach altogether?
I think that relatively new R-users like myself who are starting to tackle spatial problems (or spatial datasets) would benefit from a vignette demonstrating how to go about a multi-step spatial operation using sf
tools.
It looks like you've started a vignette to-do list - perhaps you might consider adding this request to your list. Thanks!
I now understand it - sorry, took a while. What I think we need is to add a pattern
variable to st_relate
, just like rgeos::gRelate
, which lets you select neighbours based on a pre-defined pattern, and in that case returns the indices list of st_intersect
. You could then trivially write a function st_rook
. Will look into.
Pls test; you should now get rook neighbors with
st_rook = function(a, b = a) st_relate(a, b, pattern = "F***1****")
nc %>% mutate(NB_ROOK = st_rook(.))
This is fantastic! I've tested and it works. By extension, to get queen's case neighbors as well you can do:
st_queen <- function(a, b = a) st_relate(a, b, pattern = "F***T****")
nc %>% mutate(NB_QUEEN = st_queen(.))
which constitutes an improvement on using st_intersects
for that purpose as it eliminates self-intersections.
For anyone coming across this, the Wikipedia page for DE-9IM strings is quite helpful: https://en.wikipedia.org/wiki/DE-9IM.
Thanks @walkerke , I've added this to the documentation of st_relate_pattern
.
Hi,
I am creating a function, st2nb, to compute adjacencies from sf-objects using the the code in poly2nb() using all st_rook() and st_queen(), as explained in earlier posts in this thread. However, poly2nb() takes an additional argument called snap:
snap: boundary points less than ‘snap’ distance apart are considered to indicate contiguity
Is there any way to include this in the call to st_relate?
Thanks!
Virgilio
I can think of two solutions:
st_distance
to compute distances, and compare these with snap; this however gives you a dense matrix,Hi Edzer,
I can think of two solutions:
• you can use st_distance to compute distances, and compare these with snap; this however gives you a dense matrix, • set the precision, which should round away (most?) differences smaller than snap.
Thanks for your comments. Setting the precision seems to have funny side and st_distance seems to only work point geometries. It would be doable with this but it really involves more work than using st_queen or st_rook… So I think that I will keep a warning for now. :)
Best,
Virgilio
Somewhere I should have an object that needed non-default snap that might help. I'll try to find it next week - may we talk about this at useR!?
Hi Roger,
Somewhere I should have an object that needed non-default snap that might help. I'll try to find it next week - may we talk about this at useR!?
Thanks! That would be very useful. I think that there is always the possibility of rewriting the existing code to deal with sf-objects instead of using st_relate, but this will require more work, I think.
Best,
Virgilio
> p = st_polygon(list(rbind(c(0,0),c(1,1),c(0,1),c(0,0))))
> st_distance(p, p + c(2,2))
[,1]
[1,] 1.414214
Hi
El 22 jun 2017, a las 14:30, Edzer Pebesma notifications@github.com escribió:
p = st_polygon(list(rbind(c(0,0),c(1,1),c(0,1),c(0,0)))) st_distance(p, p + c(2,2)) [,1]
Thanks for the example. In poly2nb(), the bounding boxes of two possible neighbors are checked first to see if they overlap. If they do, then the boundaries are actually checked to see if they are neighbors or not. So, there is no need to compute all pairs of distances. But I thought that perhaps with simple features there were other (simpler) options using some of the st_XXX functions.
Another option would be to add a buffer of snap/2 around each polygon and then use st_relate, right? Just thinking…
Best,
Virgilio
If they do not overlap, they can be still snap distance apart... The st_buffer approach sounds good, if you insist on supporting snap.
I found this approach worked in a use case from sp, as documented in #210. The question is a) does this generalise to other instances where the dimension of the relation is important and b) how to make the code more user-friendly and concise? If there's no quickfire answer I can open a new issue but just wanted to check I'm not missing something obvious and easy to teach first (from an R for spatial data course where people are asking about this!):
library(sp)
g = SpatialGrid(GridTopology(c(0, 0), c(1, 1), c(3, 3)))
p = as(g, "SpatialPolygons")
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.2.1, proj.4 4.9.2, lwgeom 2.3.3 r15473
library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag(): dplyr, stats
p_sf = st_as_sf(p)
st_rook = function(a, b = a) st_relate(a, b, pattern = "F***1****")
st_queen <- function(a, b = a) st_relate(a, b, pattern = "F***T****")
p_queen = p_sf %>% mutate(NB_ROOK = st_queen(., b = p_sf[5, ])) %>% filter(NB_ROOK ==
1)
nrow(p_queen)
#> [1] 8
p_rook = p_sf %>% mutate(NB_ROOK = st_rook(., b = p_sf[5, ])) %>% filter(NB_ROOK ==
1)
nrow(p_rook)
#> [1] 4
p_minDim2 = p_sf %>% filter(st_within(., y = p_sf[5, ], sparse = FALSE))
nrow(p_minDim2)
#> [1] 1
Notes so far on spdep's R-forge SVN site. This is very much work in progress, as poly2nb
is pre-SF (the definition) and thus processes invalid geometries without protesting (and is much faster, even using very primitive scanning for potential neighbours). The same holds for trying st_buffer() to find distance neighbours - the vignette will have timings.
Edzer has added a class ("sgbp"
- sparse geometry binary predicate), and row.names, and what predicate made the object attributes to lists returned by sf_relate
and friends; this will make it much easier to use sf objects inside poly2nb
, dnearneigh
, knearneigh
, and the graph neighbours, as well as helper functions and methods.
Because spdep (on R-forge) now suggests sf (and may import from), it has stopped checking and building on R-forge (but rgdal and rgeos build and check OK??). The report:
Quitting from lines 35-38 (nb_sf.Rmd)
Error: processing vignette 'nb_sf.Rmd' failed with diagnostics:
there is no package called 'sf'
But you can download the Rmd file, or checkout SVN.
Does r-forge have GDAL >= 2.0 ?
Well caught in:
checking gdal-config usability... yes
configure: GDAL: 1.10.1
checking GDAL version >= 1.6.3... yes
Request for support on R-Forge to upgrade at least GDAL sent.
If you migrate spdep to github, I will set up travis & appveyor for it.
Reply appreciated to questions a) and b) above @edzer. Cheers.
I praise myself lucky that I no longer need to understand the code I wrote for sp::over
.
So no answer?
I'll chew over transitioning spdep to github; how to keep the full SVN commit history (what changed to which file when)? I sometimes need to know when a change was committed to compare with email exchanges connected to the revision.
@Robinlovelace maybe not the long answer you'd hoped for. sp::over
in your case calls rgeos::overGeomGeom
; as you can see that uses gRelate
if minDimension
was set; gRelate
is called st_relate
in sf
. You can work from there, or from the respective documentations of these functions. As you said in the other issue: time and perseverance is all that is required.
That sounds good. Time + perseverance can indeed resolve most issues. The fact that both sf and sp use the general relate
operator suggests it should be relatively straightforward to implement. Is that something you'd be interested in? If so I could try to cobble together a PR to implement it. If not I'll still try to implement something, not least for my own benefit in terms of getting my head-round DE-9IM.
Personally no; if you think it strongly benefits many users, you can try to make that point.
Sounds reasonable. Will take a look this weekend after the course today. A day in Leeds teaching sf just completed - went well.
Wrt. spdep -> github: would it make more sense just to modernise spweights, last seen over 15 years ago? That is, un-archive an existing package name and do the weights development there? Of course, spdep would import from or depend on spweights ...
If spweights
would have tests involving sf
, the dependency would be there anyway, but second order. I don't care where spdep resides, but many will find it more attractive to contribute to it when it is on github.
They are free to contribute on SVN, but nobody (except Gianfranco) did within the repo, even when it was the platform of choice. It's a bit like gstat, lots of functionality in many different directions, so no clear pattern now. It started as spweights for nb and listw objects, sptests for tests, and spdep for spatial regression modelling, but got merged in the pre-namespace period, when there was no imports from.
What is the transfer protocol?
It is possible to transfer the repository's history and match svn user names to GitHub names. I can help out if you would like to preserve these.
Should the repo become part of r-spatial?
Thanks for the offer of help! Is this a reasonable representation of transfer?
I see that gstat is not in r-spatial, so I think I would do the same, keeping r-spatial for shared infrastructure, does that make sense?
r-spatial welcomes modern and actively maintained spatial R packages.
@rsbivand yes, that looks like a complete list of steps. This one I have used and contributed to myself a while back, includes some more steps around tags and branches, and removing large files from history (if any of these are needed).
@edzer spdep isn't modern, but is actively maintained. It also invites modularisation and rationalisation, but it isn't clear that modularisation in-place (in r-spatial) is the only choice. For example, I do not use roxygen and am not convinced that it is helpful; I feel further that mechanical code coverage is not better than help page examples showing the relationship between implementation and the published books/articles with their examples. So if modern means roxygen and testthat, spdep isn't modern. If modern is without such constraints, maybe r-spatial is viable.
With modern I was more thinking of modern, state-of-the-art methods, not of coding styles.
Aha, then yes, spdep and code/packages depending on spdep are modern ... like the recent Wagner & Zeileis spatial and tree regression two-step package, and others ...
@Nowosad : may I upload a newly imported NY8 GPKG to spData? This will let me continue with the neighbour object vignette, and start a transition to importing data from spData into spdep, then spdep to github.
@rsbivand Yes, of course. Do you prefer to make a pull request or should I give you a push access to the repository?
Thanks, when I get to it, I'll make a PR.
The vignette at:
and attached now runs, see the summary section.
You'll also need spData from https://github.com/rsbivand/spData; I haven't made a PR yet for this, so it isn't in Jakub Nowosad's repo. Maybe you can just try out the suggestions in the vignette summary without needing the data. I've been using the most recent sf development version.
Please let me know if this is clear, and if it helps.
When installing spdep from r-forge, after installing your spData fork, I'm seeing:
* creating vignettes ... ERROR
Error in CPL_geos_binop(st_geometry(x), st_geometry(y), op, par, pattern, :
Evaluation error: TopologyException: side location conflict at 411447.38362785219 4768469.9111850867.
Quitting from lines 430-433 (nb_sf.Rmd)
any ideas?
I use
spdep::poly2nb
to identify neighboring polygons, but this necessitates a conversion back tosp
- is there a way to do this without that conversion?