Closed ekarsten closed 5 years ago
Thanks, this makes a lot of sense; postgis also benefits from a st_intersects
before an st_intersection
.
I think to make it easier for @edzer to review your changes would be to squash the commits and create a PR.
Thanks for the feedback. I squashed into one commit and submitted PR #803.
I haven't had time to investigate, but I suspect similar speed gains could be achieved in st_difference by performing similar checks (but maybe not) if two objects don't intersect, then st_difference should just return the first, if they do, maybe call st_within to see if st_difference shouldn't return output and then finally call st_difference if a new geometry needs to be returned.
I think you're right. It looks like another layer, after indexing, to speedup computations. I wonder when or if it slows it down in certain cases. I guess looking at the postgis execution plans for these operations should be pretty informative.
Quick follow-up on this, did this get implemented @ekarsten? It seems from #803 that the PR didn't go in. Could be very useful for my work at the moment! Thanks.
Found this thread years later. I checked for st_intersects()
, subset out only those that do intersect, then used st_intersection()
afterwards. The workaround sped things up heaps, from an overnight task that never finished to a 15s job:
st_intersection_faster <- function(x,y,...){
#faster replacement for st_intersection(x, y,...)
y_subset <-
st_intersects(x, y) %>%
unlist() %>%
unique() %>%
sort() %>%
{y[.,]}
st_intersection(x, y_subset,...)
}
Looks like that belongs in an {sfextras} type package.
Please do provide full sessionInfo()
output for platform type and package versions, at least a summary of the objects (is s2 involved or GEOS), GEOS version (NG topology engine used or not), and access to an input object showing these characteristics. Also memory use while running, feels like memory exceeded. sf does use STRtree planar indexing anyway for topological operations, discussed in ASDAR II edition ch. 5 https://asdar-book.org/ p. 138-9, https://asdar-book.org/book2ed/cm2_mod.R chunks 41-42. The STRtree finds overlapping bounding boxes, so a superset of intersections, but performance depends on the shapes of the input geometries.
Also @gdmcdonald is your case binary or n-ary (self-intersections)? The latter can be very hard to handle with multiple overlapping geometries, where the STRtree approach may be defeated.
I did this synthetic benchmark and I can see there is a big problem with the {sf}
- {s2}
interaction. This operation on this small dataset took milliseconds in {sf/GEOS}
and {s2}
(using s2_intersects_matrix()
), but in {sf/s2}
it took more than 3 minutes.
One more test on the same example. Overall, geos_intersects_matrix()
appears to be faster than s2_intersects_matrix()
and sf::st_intersects()
is faster than sf::st_intersection()
. But I think the exact case of @gdmcdonald would be more interesting to check.
There's still this note here:
that essentially indicates we don't yet have a s2_intersection_matrix
, but use a blunt loop. How hard would it be to create the _matrix
version, @paleolimbot ?
And the second code chunk in https://r-spatial.org/book/12-Interpolation.html#a-population-grid ...
Probably not hard!
Was there any fix to this already? I've been running a st_intersection
after pre-selecting with st_intersects
, and I got a very significant speed-up (that presumably depends on the ratio of non-intersecting features).
For anybody else that also lands here, this is the snippet I am using:
library(sf)
library(dplyr)
library(purrr)
library(progress)
intersections <- st_intersects(x = xFeatures, y = yFeatures)
pb <- progress_bar$new(format = "[:bar] :current/:total (:percent)", total = dim(xFeatures)[1])
intersectFeatures <- map_dfr(1:dim(xFeatures)[1], function(ix){
pb$tick()
st_intersection(x = xFeatures[ix,], y = yFeatures[intersections[[ix]],])
})
I know this is a year later, but to follow up on @EhrmannS comment. I used their code example, but when I do so I run into funny errors if I then try to use other st_*() functions. For example, if I try to run st_area(intersectFeatures) I get
Error in `stopifnot()`:
ℹ In argument: `area = st_area(.)`.
Caused by error:
! Not compatible with requested type: [type=list; target=double].
but if I run st_area(head(intersectFeatures))
it runs no problem. I am trying to figure out what happens to the geometries when I use head(intersectFeatures)
?
I noticed while doing some work with
st_intersection
that I could speed it up by applyingst_intersects
first and then only callingst_intersection
on the pairs of geometries that actually intersect. In light of this, I tried implementing a fork of sf that builds this check into thest_intersection
function.I am a novice C++ programmer, so I'm sure my implementation isn't ideal, but I have linked it here around line 730.
Below is some reproducible example code to test the speed of the new version vs. the old version. The speed improvement isn't insane (~ 20%), but it's nontrivial. I am intersecting North Carolina with some 3mi by 3mi grids.
I am using GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3 Session info: