Closed mdsumner closed 4 years ago
The data situation looks like this, first without drawing every fragment's border:
ramp2 <- grDevices::colorRampPalette(c("#54A3D1", "#60B3EB",
"#78C8F0", "#98D1F5", "#B5DCFF", "#BDE1F0", "#CDEBFA",
"#D6EFFF", "#EBFAFF", "grey92", "grey94", "grey96", "white"))
plot(fragments_sf, col = ramp2(max(fragments_sf$region))[fragments_sf$region], border = NA)
and after union into 5 MULTIPOLYGONs:
plot(x, col = ramp2(nrow(x)))
The data was created with:
library(contourPolys)
library(raster)
library(sf)
levels <- c(-6000, -4000, -2000, 0, 2000, 4000)
fc <- contourPolys::fcontour_sf(xFromCol(topo), rev(yFromRow(topo)), t(as.matrix(flip(topo, "y"))), c = levels)
library(sf)
fragments_sf <- st_sf(st_sfc(fc[[1]], crs = projection(topo)), region = unlist(fc[[2]]))
saveRDS(fragments_sf, file = "fragments_sf.rds")
I was hoping data.table
would be faster, but it is only by a tad...
library(sf)
fragments_sf <- readRDS("C:/Users/tim.appelhans/Downloads/fragments_sf.rds")
names(fragments_sf) = c("region", "geometry")
st_geometry(fragments_sf) = "geometry"
# dplyr
library(dplyr)
system.time({
x <- fragments_sf %>% group_by(region) %>% summarize() %>% st_cast("MULTIPOLYGON")
})
# user system elapsed
# 7.12 0.00 7.14
# data.table
library(data.table)
dt = as.data.table(fragments_sf)
system.time({
x = dt[, { geometry = sf:::CPL_geos_union(geometry)
geometry = sf::st_sfc(geometry)
geometry = sf::st_sf(geometry)
}, by = "region"]
})
# user system elapsed
# 6.39 0.02 6.41
# base sf
system.time({
x = aggregate(fragments_sf, by = list(fragments_sf$region), FUN = mean)
})
# user system elapsed
# 8.25 0.00 8.29
splitting by group and processing in parallel gives a boost, below is my attempt with furrr:
library(furrr)
future::plan('multiprocess')
system.time({
fm <- future_map(split(fragments_sf, fragments_sf$region), ~ sf::st_union(.x))
})
# user system elapsed
# 0.665 0.024 3.977
of course this is slightly incomplete as it results in a list, I could not find an elegant way to recombine.
data.table::rbindlist
?
future_map_dfr
I imagine
The problem would likely arise coz done naively, each polygon pair has to be unwound and rewound, making it an O(>N^2) operation. For this special case, you could write a pretty easy C++ function to simply discard all touching edges in each group, then just rewind the leftover as single enclosing polygon. Should be able to achieve O(N), which ought to be approximately enormously faster. Want me to give it a try?
I've done the touching segment removal, it's the nesting of rings that causes me trouble :) If the polygons have holes filled by other regions, it's easier to union fragments than it is to figure out nesting.
https://github.com/hypertidy/contourPolys#other-attempts
But, if you can see a better way at any place definitely keen to see any and every approach to this!
Actually, that would be super valuable because currently I have to use sf to determine which segments go together in what order - so consider this the cyclerize example! In this context, the edge removal could happen here for each k
without popping back to R: https://github.com/hypertidy/contourPolys/blob/master/src/cutpoints.cpp#L269
Oh! Apologies for being scatty here, I'm chasing a couple of ideas down ...
The example in the readme incorrectly filters on segments that occur twice - but I think it should detect exactly when a segment occurs an odd number of times, and only keep one instance. Here's a more clear example:
## basic example data as per filled.contour
z <- as.matrix(volcano)
y <- seq_len(ncol(z))
x <- seq_len(nrow(z))
library(contourPolys)
library(dplyr)
## choose one single region, actually two nested ring polygons
levels <- pretty(range(z), n = 7)[3:4]
## return the raw fragments from the fillled.contour code
p <- contourPolys::fcontour(x, y, z, levels)
## reshape the output - x, y - coordinates
## lower/upper - the region bounds in z
## g - the region grouping index (path ID)
m <- cbind(x = unlist(p[[1]]),
y = unlist(p[[2]]),
lower = rep(unlist(p[[3]]), lengths(p[[1]])),
upper = rep(unlist(p[[4]]), lengths(p[[1]])),
g = rep(seq_along(p[[1]]), lengths(p[[1]])))
gd <- tibble::as_tibble(m)
## ensure that every ring is closed (that's assumed from the output)
gd <- gd %>% group_by(g) %>% slice(c(1:n(), 1)) %>% ungroup()
## de-duplicate all vertices, rename g to path, id regions by label
udata <- gd %>% transmute(x, y, path = g, region = paste(lower, upper, sep = "-")) %>%
unjoin::unjoin(x, y, key_col = ".vx")
## split by path, convert to segment form cbind(head, tail)
segs <- purrr::map_df(split(udata$data$.vx, udata$data$path)[unique(udata$data$path)], silicate:::path_to_segment, .id = "path")
## identify segments by region by matching on path
segs$region <- udata$data$region[match(as.integer(segs$path), udata$data$path)]
## de-duplicated segments
vertex0 <- pmin(segs$.vertex0, segs$.vertex1)
vertex1 <- pmax(segs$.vertex0, segs$.vertex1)
segs$.vertex0 <- vertex0
segs$.vertex1 <- vertex1
## remove any segments occuring 2, 4, 6, any even number of times
usegs <- segs %>% mutate(segid = paste(.vertex0, .vertex1, sep = "-")) %>%
group_by(region, segid) %>% mutate(n = n()) %>%
filter(n %% 2 == 1) %>%
ungroup()
## plot the segments (some are repeated still)
tab <- usegs %>%
inner_join(udata$.vx, c(".vertex0" = ".vx")) %>%
rename(x0= x, y0 = y) %>%
inner_join(udata$.vx, c(".vertex1" = ".vx"))
library(ggplot2)
ggplot(tab , aes(x = x0, y = y0, xend = x, yend = y, col = factor(n))) + geom_segment() #+ guides(colour = FALSE)
What is still missing above is code to remove degenerate fragments (area = 0) which occurs in fcontour_sf now, and to keep only one segment when it doesn't occur in pairs.
@tim-salabim and @mdsumner thanks for the suggestions I tried both rbindlist
and future_map_dfr
without success. The problem was that the pieces produced by split switch to sfc
objects and those functions need data.frame
like inputs. After some tinkering I got something to work and it still ran at just under 6 sec.
system.time({
fm <- split(fragments_sf, fragments_sf$region) %>%
future_map(~ sf::st_union(.x)) %>%
plyr::ldply(.fun = st_sf) %>%
st_sf()
})
# user system elapsed
# 5.802 0.008 5.809
@mrjoh3 in case it's sfc
s then c
should do the trick.
Cool @mrjoh3 that's fine I think it's a minor (albeit painful) detail getting to the final object sf type, appreciate the benchmark!
Another wrinkle is that filled.contour is not actually partitioning the plane, we end up with overlaps between regions. So that needs attention, today I'm leaning back to path consolidation (remove repeat segments within region) and nesting - but I'll probably try to find the bugs in filled.contour first.
If there's better marching squares code around I'd pursue that :)
This was previous exploration https://github.com/r-spatial/sf/issues/693 and today is my favourite way to go, trim pip lookups of rings within rings and classify by evenodd
And this is related https://github.com/r-spatial/sf/issues/95
Using this package
contourPolys
we can create all the pieces of afilled.contour
plot as sf geometry.What's the fastest way to union fragments of each region together to give 5 MULTIPOLYGONs?
In
fragments_sf.zip
isfragments_sf.rds
which contains a sf data frame of 50663 POLYGONs, withregion
identifier - there are 5 regions.fragments_sf.zip
Using group_by union takes 10-15s which is pretty good but this blows out a lot for larger rasters (source grid is 278 columns, 161 rows) - the marching squares code in
filled.contour
is not the bottle neck, and doesn't need to scale as it simply draws the fragments as it goes.(It's possible to remove internal edges and then polygonize just the boundaries, but it's tricky to get the nesting right - I'm still exploring that).