paleolimbot / wk

Lightweight Well-Known Geometry Parsing
https://paleolimbot.github.io/wk
Other
47 stars 7 forks source link

Slow polygon plotting #190

Open kadyb opened 1 year ago

kadyb commented 1 year ago

This is not a big issue, but maybe there is some room to improve polygon plotting performance. Currently compared to {sf} it is ~3 times slower.

library("sf")
library("geos")
set.seed(1)

n = 10000
df = data.frame(x = rnorm(n), y = rnorm(n))
pts_sf = st_as_sfc(st_as_sf(df, coords = c("x", "y")))
buffer_sf = st_buffer(pts_sf, 10, nQuadSegs = 30)
buffer_geos = as_geos_geometry(buffer_sf)

system.time(plot(buffer_sf, axes = TRUE))
#> user  system elapsed 
#> 0.47    1.13    1.59 
system.time(plot(buffer_geos))
#> user  system elapsed 
#> 4.16    1.55    5.71
paleolimbot commented 1 year ago

Just a note that the geos version goes through its own plotting generic in the geos package that attempts simplification based on the resolution of the output. This means that there's an extra call to geos_simplify() (or potentially some sorting by geometry type) that cause some slowdowns for this example (with the tradeoff being that larger datasets in theory plot fewer points and go faster). I haven't spent much time testing this.

You can force the wk version by using wk_plot() (still slower):

library("sf")
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library("geos")
set.seed(1)

n = 10000
df = data.frame(x = rnorm(n), y = rnorm(n))
pts_sf = st_as_sfc(st_as_sf(df, coords = c("x", "y")))
buffer_sf = st_buffer(pts_sf, 10, nQuadSegs = 30)
buffer_geos = as_geos_geometry(buffer_sf)

system.time(plot(buffer_sf, axes = TRUE))

#>    user  system elapsed 
#>   0.946   0.021   0.972

system.time(plot(buffer_geos))

#>    user  system elapsed 
#>   3.125   0.183   3.313

system.time(wk::wk_plot(buffer_sf))

#>    user  system elapsed 
#>   2.061   0.045   2.110

Created on 2023-08-06 with reprex v2.0.2

mdsumner commented 10 months ago

I came to say this, I'll investigate because it looks like terra plotting is much the same as wk, there might be something in the way the calls to polypath are being invoked?

## geopackage of the footprints of COP90 COGs
dsn <- "/vsicurl/https://github.com/mdsumner/cog-example/raw/main/vrtti/cop90.vrt.gpkg"

# FAST
v <- terra::vect(dsn)
terra::plot(v)

# FAST
ex0 <- do.call(rbind, vapour::vapour_read_extent(dsn))
ex <- wk::rct(ex0[,1], ex0[,3], ex0[,2], ex0[,4])
plot(ex)

# not fast
pl <- wk::wkb(vapour::vapour_read_geometry(dsn))
plot(pl)

but, another prospect is using vectorized grid plotting via gridBase which I do here: https://github.com/hypertidy/silicate/blob/f82f80cbc4bbf314de1e764f887204443b64e5c1/R/plot.R#L191-L192 (because polypath for triangles was punishingly slow, but I feel like there's more to explore here)

mdsumner commented 10 months ago

it's quite a bit faster if we split the coords as a matrix up front and pass each vector of xy down to the handling plot functions, i.e. so I do

 xii <- wk_coords(x)
  xii <- split(t(as.matrix(xii[c("x", "y")])), rep(xii$feature_id, each = 2L))

at the size loop so there's no atomic element dataframe at the bottom

https://github.com/paleolimbot/wk/blob/4e1d6f2edc213a41d535104102a0bd2f6fcb3e8e/R/plot.R#L111-L112

but, it affects all the handler plot functions so will need a bit of thought.

Still it's not as fast as terra and I think the gridBase way is super cool but will be a pain to match all the arguments for polypath (which I really like about wk).

(fwiw D Cooley experimente with all this geometry splitting in {geometries}, and I always found that avoid data frame was very important for speed with tiny elements like this so an upfront wk_coords() split into a list might not be bad thing to add in C++ here. )

mdsumner commented 10 months ago

oh except for mixed types, well I'll be having a look, maybe batching up groups of types with this matrix split 🙏