hypertidy / contourPolys

see isoband on CRAN for a better solution
7 stars 1 forks source link

speed of merging fragments #4

Closed mdsumner closed 4 years ago

mdsumner commented 5 years ago

Using this package contourPolys we can create all the pieces of a filled.contour plot as sf geometry.

What's the fastest way to union fragments of each region together to give 5 MULTIPOLYGONs?

In fragments_sf.zip is fragments_sf.rds which contains a sf data frame of 50663 POLYGONs, with region identifier - there are 5 regions.

fragments_sf.zip

Using group_by union takes 10-15s which is pretty good but this blows out a lot for larger rasters (source grid is 278 columns, 161 rows) - the marching squares code in filled.contour is not the bottle neck, and doesn't need to scale as it simply draws the fragments as it goes.

library(sf)
fragments_sf <- readRDS("fragments_sf.rds")

library(dplyr)
system.time({
 x <- fragments_sf %>% group_by(region) %>% summarize() %>% st_cast("MULTIPOLYGON")
})
#   user  system elapsed 
# 13.440   0.151  12.769 

(It's possible to remove internal edges and then polygonize just the boundaries, but it's tricky to get the nesting right - I'm still exploring that).

mdsumner commented 5 years ago

The data situation looks like this, first without drawing every fragment's border:

ramp2 <- grDevices::colorRampPalette(c("#54A3D1", "#60B3EB", 
                                       "#78C8F0", "#98D1F5", "#B5DCFF", "#BDE1F0", "#CDEBFA", 
                                       "#D6EFFF", "#EBFAFF", "grey92", "grey94", "grey96", "white"))

plot(fragments_sf, col = ramp2(max(fragments_sf$region))[fragments_sf$region], border = NA)

image

and after union into 5 MULTIPOLYGONs:

plot(x, col = ramp2(nrow(x)))

image

mdsumner commented 5 years ago

The data was created with:

library(contourPolys)
library(raster)
library(sf)
levels <- c(-6000, -4000, -2000, 0, 2000, 4000)

fc <- contourPolys::fcontour_sf(xFromCol(topo), rev(yFromRow(topo)), t(as.matrix(flip(topo, "y"))), c = levels)
library(sf)
fragments_sf <- st_sf(st_sfc(fc[[1]], crs = projection(topo)), region = unlist(fc[[2]]))
saveRDS(fragments_sf, file = "fragments_sf.rds")
tim-salabim commented 5 years ago

I was hoping data.table would be faster, but it is only by a tad...

library(sf)

fragments_sf <- readRDS("C:/Users/tim.appelhans/Downloads/fragments_sf.rds")
names(fragments_sf) = c("region", "geometry")
st_geometry(fragments_sf) = "geometry"

# dplyr
library(dplyr)
system.time({
  x <- fragments_sf %>% group_by(region) %>% summarize() %>% st_cast("MULTIPOLYGON")
})
# user  system elapsed 
# 7.12    0.00    7.14 

# data.table
library(data.table)
dt = as.data.table(fragments_sf)

system.time({
  x = dt[, { geometry = sf:::CPL_geos_union(geometry) 
             geometry = sf::st_sfc(geometry)
             geometry = sf::st_sf(geometry)
            }, by = "region"]
})
# user  system elapsed 
# 6.39    0.02    6.41

# base sf
system.time({
  x = aggregate(fragments_sf, by = list(fragments_sf$region), FUN = mean)
})
# user  system elapsed 
# 8.25    0.00    8.29 
mrjoh3 commented 5 years ago

splitting by group and processing in parallel gives a boost, below is my attempt with furrr:

library(furrr)
future::plan('multiprocess')

system.time({

  fm <- future_map(split(fragments_sf, fragments_sf$region), ~ sf::st_union(.x))

})

#   user  system elapsed 
#  0.665   0.024   3.977 

of course this is slightly incomplete as it results in a list, I could not find an elegant way to recombine.

tim-salabim commented 5 years ago

data.table::rbindlist ?

mdsumner commented 5 years ago

future_map_dfr I imagine

mpadge commented 5 years ago

The problem would likely arise coz done naively, each polygon pair has to be unwound and rewound, making it an O(>N^2) operation. For this special case, you could write a pretty easy C++ function to simply discard all touching edges in each group, then just rewind the leftover as single enclosing polygon. Should be able to achieve O(N), which ought to be approximately enormously faster. Want me to give it a try?

mdsumner commented 5 years ago

I've done the touching segment removal, it's the nesting of rings that causes me trouble :) If the polygons have holes filled by other regions, it's easier to union fragments than it is to figure out nesting.

https://github.com/hypertidy/contourPolys#other-attempts

But, if you can see a better way at any place definitely keen to see any and every approach to this!

mdsumner commented 5 years ago

Actually, that would be super valuable because currently I have to use sf to determine which segments go together in what order - so consider this the cyclerize example! In this context, the edge removal could happen here for each k without popping back to R: https://github.com/hypertidy/contourPolys/blob/master/src/cutpoints.cpp#L269

mdsumner commented 5 years ago

Oh! Apologies for being scatty here, I'm chasing a couple of ideas down ...

The example in the readme incorrectly filters on segments that occur twice - but I think it should detect exactly when a segment occurs an odd number of times, and only keep one instance. Here's a more clear example:

## basic example data as per filled.contour
z <- as.matrix(volcano)
y <- seq_len(ncol(z))
x <- seq_len(nrow(z))

library(contourPolys)
library(dplyr)

## choose one single region, actually two nested ring polygons
levels <- pretty(range(z), n = 7)[3:4]
## return the raw fragments from the fillled.contour code
p <- contourPolys::fcontour(x, y, z, levels)

## reshape the output - x, y - coordinates
##                      lower/upper - the region bounds in z
##                      g - the region grouping index (path ID)
m <- cbind(x = unlist(p[[1]]), 
           y = unlist(p[[2]]), 
           lower = rep(unlist(p[[3]]), lengths(p[[1]])), 
           upper = rep(unlist(p[[4]]), lengths(p[[1]])), 
           g = rep(seq_along(p[[1]]), lengths(p[[1]]))) 

gd <- tibble::as_tibble(m)

## ensure that every ring is closed (that's assumed from the output)
gd <- gd %>% group_by(g) %>% slice(c(1:n(), 1)) %>% ungroup()

## de-duplicate all vertices,  rename g to path, id regions by label
udata <-   gd %>% transmute(x, y, path = g, region = paste(lower, upper, sep = "-")) %>% 
  unjoin::unjoin(x, y, key_col = ".vx")

## split by path, convert to segment form cbind(head, tail)
segs <- purrr::map_df(split(udata$data$.vx, udata$data$path)[unique(udata$data$path)], silicate:::path_to_segment, .id = "path") 

## identify segments by region by matching on path
segs$region <- udata$data$region[match(as.integer(segs$path), udata$data$path)]

## de-duplicated segments
vertex0 <- pmin(segs$.vertex0, segs$.vertex1)
vertex1 <- pmax(segs$.vertex0, segs$.vertex1)
segs$.vertex0 <- vertex0
segs$.vertex1 <- vertex1

## remove any segments occuring 2, 4, 6, any even number of times
usegs <- segs %>% mutate(segid = paste(.vertex0, .vertex1, sep = "-")) %>% 
  group_by(region, segid) %>% mutate(n = n()) %>% 
  filter(n %% 2 == 1) %>% 
  ungroup()

## plot the segments (some are repeated still)
tab <- usegs %>% 
  inner_join(udata$.vx, c(".vertex0" = ".vx")) %>% 
  rename(x0= x, y0 = y) %>% 
  inner_join(udata$.vx, c(".vertex1" = ".vx"))
library(ggplot2)
ggplot(tab , aes(x = x0, y = y0, xend = x, yend = y, col = factor(n))) + geom_segment() #+ guides(colour = FALSE)

What is still missing above is code to remove degenerate fragments (area = 0) which occurs in fcontour_sf now, and to keep only one segment when it doesn't occur in pairs.

image

mrjoh3 commented 5 years ago

@tim-salabim and @mdsumner thanks for the suggestions I tried both rbindlist and future_map_dfr without success. The problem was that the pieces produced by split switch to sfc objects and those functions need data.frame like inputs. After some tinkering I got something to work and it still ran at just under 6 sec.

system.time({
  fm <- split(fragments_sf, fragments_sf$region) %>%
    future_map(~ sf::st_union(.x)) %>% 
    plyr::ldply(.fun = st_sf) %>% 
    st_sf()
})
#   user  system elapsed 
#  5.802   0.008   5.809 

parallel_merge

tim-salabim commented 5 years ago

@mrjoh3 in case it's sfcs then c should do the trick.

mdsumner commented 5 years ago

Cool @mrjoh3 that's fine I think it's a minor (albeit painful) detail getting to the final object sf type, appreciate the benchmark!

mdsumner commented 5 years ago

Another wrinkle is that filled.contour is not actually partitioning the plane, we end up with overlaps between regions. So that needs attention, today I'm leaning back to path consolidation (remove repeat segments within region) and nesting - but I'll probably try to find the bugs in filled.contour first.

If there's better marching squares code around I'd pursue that :)

mdsumner commented 5 years ago

This was previous exploration https://github.com/r-spatial/sf/issues/693 and today is my favourite way to go, trim pip lookups of rings within rings and classify by evenodd

mdsumner commented 5 years ago

And this is related https://github.com/r-spatial/sf/issues/95

mdsumner commented 4 years ago

see https://CRAN.r-project.org/package=isoband