Convince me that a segment vector class is worth adding

paleolimbot commented 2 years ago

@mdsumner: I've spent a bit of time with wk in the past week. The segment vector class is definitely useful for some specific areas of geometry, but I'm hesitant to add another vector class to core wk until I have a use-case.

One obvious reason I can think of is that the below code is pretty slow and shouldn't have to be. A use-case I've run into that I wished had some kind of segment class is when I was trying to implement some trajectory interpolation; however, I'm not convinced that I couldn't have done it just as easily with just data.frame(start = xy(), end = xy()).

Another use-case is indexing a geometry to locate intersecting edges; another is to do overlay operations, all of which build lines/polygons from a set of edges. I can't think of a reason those last two would ever be implemented in base R (and if they were, they could also use data.frame(start = xy(), end = xy())?

Some examples to play with that use the existing functionality available in wk:

library(wk)
library(dplyr, warn.conflicts = FALSE)

wk_edges <- function(handleable) {
  crs <- wk_crs(handleable)
  coords <- wk_coords(handleable)
  coords %>% 
    group_by(feature_id, part_id, ring_id) %>% 
    summarise(
      edge = data.frame(
        start = xy(x[-n()], y[-n()], crs = crs),
        end = xy(x[-1], y[-1], crs = crs)
      ),
      .groups = "drop"
    )
}

edge_linestring <- function(edge) {
  n_xy <- nrow(edge) * 2
  xy_all <- xy(
    double(n_xy),
    double(n_xy), 
    crs = wk_crs_output(edges$edge$start, edge$end)
  )
  xy_all[seq(1, n_xy, by = 2)] <- edge$start
  xy_all[seq(2, n_xy, by = 2)] <- edge$end

  wk_linestring(xy_all, feature_id = rep(1:nrow(edge), each = 2))
}

nc <- sf::read_sf(system.file("shape/nc.shp", package = "sf"))
edges <- wk_edges(nc)
edges_linestring <- edge_linestring(edges$edge)

head(edges)
#> # A tibble: 6 × 4
#>   feature_id part_id ring_id edge$start           $end                
#>        <int>   <int>   <int> <wk_xy>              <wk_xy>             
#> 1          1       2       1 (-81.47276 36.23436) (-81.54084 36.27251)
#> 2          1       2       1 (-81.54084 36.27251) (-81.56198 36.27359)
#> 3          1       2       1 (-81.56198 36.27359) (-81.63306 36.34069)
#> 4          1       2       1 (-81.63306 36.34069) (-81.74107 36.39178)
#> 5          1       2       1 (-81.74107 36.39178) (-81.69828 36.47178)
#> 6          1       2       1 (-81.69828 36.47178) (-81.70280 36.51934)
head(edges_linestring)
#> <wk_wkb[6] with CRS=EPSG:4267>
#> [1] <LINESTRING (-81.4728 36.2344, -81.5408 36.2725)>
#> [2] <LINESTRING (-81.5408 36.2725, -81.562 36.2736)> 
#> [3] <LINESTRING (-81.562 36.2736, -81.6331 36.3407)> 
#> [4] <LINESTRING (-81.6331 36.3407, -81.7411 36.3918)>
#> [5] <LINESTRING (-81.7411 36.3918, -81.6983 36.4718)>
#> [6] <LINESTRING (-81.6983 36.4718, -81.7028 36.5193)>

wk_plot(nc[1, ])

wk_plot(edges_linestring[edges$feature_id == 1])

^{Created on 2021-12-29 by the reprex package (v2.0.1)}

mdsumner commented 2 years ago

it's probably not worth it unless the segments are indexed to a vertex pool, that's where I always start a decomposition task, with silicate::SC0.

thanks for the examples!

paleolimbot commented 2 years ago

They helped me flush out a few awkward usages! Segments will be a better fit for Arrow, because indexed segments == dictionary-encoded point array, which Arrow supports out-of-the-box.

mdsumner commented 2 years ago

ah ok, that's interesting

I've been thinking about this low level while on holidays, what my original motivations were etc

the major problem is the sf behemoth, the lack of an api for it, lack of extensibility, and adherence to the 2D SF path-only types

wk seems like part of that missing api, and exploding paths into component primitives seems a fundamental task, missing in sf but reimplemented over and over in other packages

geos now has triangulation by ear clipping, so the obvious 1D corollary is there, but still you can just treat triangles and segments as polygons and lines ( hopefully without polygon closed vertex duplication)

rcts are different, they really are like a mini mesh, implicitly reusing coordinates by their structure

I think use of xy(), xy(), xy() is a very interesting idea, that's probably enough for the segment and triangle needs, when there's no vertex pool

there's still a relationship of rct (arbitrary extents) to rasters, where the mesh goes from one quad to many in a structure, those implicit coordinates collapse to extent, dimension - so there's a lazy set of vertices for materializing with transformation, and that's a bit like the vertex pool thing ....

holiday thoughts ... thanks for the prompt 🙏

paleolimbot commented 2 years ago

Excellent thoughts!

I did a lot of wk-ing over the holiday and spent some time thinking about various parts of it and how the data structures fit together. I think the main thing wk is doing is saying that yes, everything is a simple feature, but you can represent it lots of ways. The reader/handler stuff iterates over things like rectangles and grids as if they were simple features polygons, but also allows shortcuts for when that's not what you want.

I think the "problem" with sf is the same as the "problem" with GEOS and many other abstractions of geometry based on simple features, which is they're all represented as individual geometries. Basically, the difference between list(c(x, y)) (a list of points!) and list(x = c(...), y = c(...)), which is one "thing" but represents a vector of points. The same thing happens at the C and C++ level for GEOS...there's a lot of tiny objects floating around that all have to be allocated and deleted. That's a pattern that doesn't scale to tens of millions of features, and is particularly evident with things like points, tiny segments, and triangles.

I think that's why Arrow for geometry is so cool...the basic unit is an Array, and the basic premise is to process data in chunks. That makes is easy to generalize: chunks can contain a variable number of features but roughly the same number of coordinates, so you're never stuck having to pick between making points fast and supporting polygons.

grd is on the backburner while I put some much needed effort into S2, but it's still a go, and even currently does a few of the things you've talked about:

library(grd)
wk::as_rct(grd(nx = 2, ny = 2, bbox = wk::rct(0, 0, 1, 1)))
#> <wk_rct[4]>
#> [1] [0.0 0.5 0.5 1.0] [0.0 0.0 0.5 0.5] [0.5 0.5 1.0 1.0] [0.5 0.0 1.0 0.5]

That's a long way of saying...segments and triangles are important! But probably not needed in wk (perhaps wkutils could be a good fit...it's currently a complete duplicate of wk at the moment and could have some cool stuff added that isn't minimal enough for wk).

Keep the thoughts coming!

mdsumner commented 2 years ago

here's the silicate way for your original example (without crs round-trip for now)

library(wk)
nc <- sf::read_sf(system.file("shape/nc.shp", package = "sf"))
sc <- silicate::SC0(nc)
edge_idx <- as.matrix(do.call(rbind, silicate::sc_object(sc)$topology_)[c(".vx0", ".vx1")])
v <- silicate::sc_vertex(sc)
xy_all <- xy(v$x_[t(edge_idx)], v$y_[t(edge_idx)])
wk_linestring(xy_all, feature_id = rep(1:nrow(edge_idx), each = 2))

mdsumner commented 2 years ago

Your code above does the right lead-lag offset for a set of sf coordinates grouped by paths, but it's sf-specific - it's not hard to do but I don't think folks should have to dream it up to do basic dev - and this same kind of code is now embedded in dozens of packages, not - here's how we conceptually collapse paths to segments, here's a tool for sf, and these other formats that are also path-based. I really thought spatial-meets-tidyverse should have these things as a starting point, but the outcome is so wildly different ...

In this case there's no need for de-duplicating the vertices, that's just something silicate always does as it was a key motivator - that indexing is often used, but on the road from polygons to line segments it's not actually required. I'm very much into these component steps and think we need these mini domain-specific tasks in a shared lib. Consider that geos itself actually does all these kinds of component steps too, but it's SF-in, SF-out you don't get access to the intermediate algorithm steps. (CGAL is a lib of all those intermediates but it's way overkill too.)

Finally, this decomposition is much harder for triangles, much less readily coded off the cuff, ear clipping is inherently path-based, mesh methods inherently segment-based, for example and so the problem of code duplication/sharing is much bigger, there are families of intermediate steps that we don't have as tools.

mdsumner commented 2 years ago

uh, actually what I should do is replace silicate$vertex with an xy,xyz implementation ..

that and sf to segments, sf to triangles as linestrings... would cover a lot for me here 😸

mdsumner commented 2 years ago

btw I had another more successful go at arc densification:

https://github.com/hypertidy/bigcurve/blob/master/R/bisect.R

interestingly, it seems to really matter about transformation accuracy - it won't complete in some cases using coarse reprojection (I think that's what's happening)

paleolimbot commented 2 years ago

Nice! Something like that is coming to s2 in the near future...you can sort of do it now but only with plate carree <-> s2 xyz. The tessellation is kind of like the d3 adaptive resample but in reverse...parameterized in terms of the distance you care about on the face of the earth not in terms of the resolution in projected space. I'd like to have a minimal version in crs2crs at some point (but s2 is the next thing after the geoarrow stuff is ready for testing).

library(s2)
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1

tessellate_geod <- function(x, distance, radius = s2_earth_radius_meters()) {
  coords_tes <- wk::wk_handle(
    x,
    s2_unprojection_filter(
      s2_projection_filter(
        wk::sfc_writer(),
        tessellate_tol = distance / radius
      )
    )
  )

  coords_tes %>% st_set_crs(st_crs(x))
}

tessellate_cartesian <- function(x, distance, radius = s2_earth_radius_meters()) {
  coords_tes <- wk::wk_handle(
    x,
    s2_unprojection_filter(
      s2_projection_filter(wk::sfc_writer()),
      tessellate_tol = distance / radius
    )
  )

  coords_tes %>% st_set_crs(st_crs(x))
}

plot(tessellate_geod(sf::st_as_sfc("LINESTRING (0 0, 0 45, -60 45)"), 100))

^{Created on 2022-01-07 by the reprex package (v2.0.1)}

paleolimbot / wk

Convince me that a segment vector class is worth adding #127