hypertidy / silicate

A general form for complex data
https://hypertidy.github.io/silicate/
53 stars 4 forks source link

explode() sf example #102

Closed mdsumner closed 4 years ago

mdsumner commented 4 years ago
#' Explode to segments
#' 
#' Decompose lines or polygons to line segments
#' @param x sf line or polygon dataframe object
#' @param ... not used
#' @return sf linestring data frame
sfx_explode <- function(x, ...) {
  ## decompose to segments (edges), will work on lines or polygons doesn't matter
  sc <- silicate::SC0(x)
  ## unnest the nested segment indexes
  ## can remove unnest with a bit of do.call and lapply nrow set-up
  segs <- tidyr::unnest(sc$object, cols = c("topology_"))
  ## index vertices x_ y_ by segment matrix
  coords <- sc$vertex[as.vector(t(as.matrix(segs[c(".vx0", ".vx1")]))), c("x_", "y_")]
  ## column name to avoid clash with existing data
  idname <- ".crazy_id_name_873987387387_"
  ## create line id for each coordinate pair
  coords[[idname]] <- rep(seq_len(dim(segs)[1L]), each = 2L)
  ## build segment linestrings with sfheaders
  sflines <- sfheaders::sf_linestring(coords, 
                                      x = "x_", y = "y_", linestring_id = idname)

  ## drop vertex columns from silicate output now that we have used them
  segs[[".vx0"]] <- NULL
  segs[[".vx1"]] <- NULL
  ## if any object attributes remaining stick them onto the sf 
  if (dim(segs)[2] > 0) {
    ## add columns back in 
    sflines[names(segs)] <- segs
  }
  sflines[[idname]] <- NULL
  ## restore projection metadata
  sf::st_set_crs(sflines, sf::st_crs(x))
}

## assume we have lines
lines <- sf::st_cast(silicate::inlandwaters, "MULTILINESTRING")
sfx_explode(lines)
#> Simple feature collection with 33455 features and 3 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -681074.5 ymin: -2677570 xmax: 2477934 ymax: 1113843
#> epsg (SRID):    NA
#> proj4string:    +proj=lcc +lat_1=-47 +lat_2=-17 +lat_0=-32 +lon_0=136 +x_0=0 +y_0=0 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> First 10 features:
#>    id                       geometry     ID                     Province
#> 1   1 LINESTRING (1116371 -458418... 103841 Australian Capital Territory
#> 2   2 LINESTRING (1117093 -457110... 103841 Australian Capital Territory
#> 3   3 LINESTRING (1117172 -456893... 103841 Australian Capital Territory
#> 4   4 LINESTRING (1117741 -456561... 103841 Australian Capital Territory
#> 5   5 LINESTRING (1117629 -455510... 103841 Australian Capital Territory
#> 6   6 LINESTRING (1116923 -453749... 103841 Australian Capital Territory
#> 7   7 LINESTRING (1117385 -453058... 103841 Australian Capital Territory
#> 8   8 LINESTRING (1118218 -452672... 103841 Australian Capital Territory
#> 9   9 LINESTRING (1118583 -451764... 103841 Australian Capital Territory
#> 10 10 LINESTRING (1118420 -451204... 103841 Australian Capital Territory

Created on 2019-11-15 by the reprex package (v0.3.0)

ibarraespinosa commented 4 years ago

Hello @mdsummer. Thanks for the message on twitter This is the data that Im trying to split, My function is not working osm_pedals.rds.zip

mdsumner commented 4 years ago

fixed a bunch of bugs, works now

  x <- readRDS("osm_pedals.rds")

sfx_explode <- function(x, ...) {
  ## decompose to segments (edges), will work on lines or polygons doesn't matter
  sc <- silicate::SC0(x)
  ## unnest the nested segment indexes
  ## can remove unnest with a bit of do.call and lapply nrow set-up
  segs <- tidyr::unnest(sc$object, cols = c("topology_"))
  ## index vertices x_ y_ by segment matrix
  coords <- sc$vertex[as.vector(t(as.matrix(segs[c(".vx0", ".vx1")]))), c("x_", "y_")]
  ## column name to avoid clash with existing data
  idname <- ".crazy_id_name_873987387387_"
  ## create line id for each coordinate pair
  coords[[idname]] <- rep(seq_len(dim(segs)[1L]), each = 2L)
  ## build segment linestrings with sfheaders
  sflines <- sfheaders::sf_linestring(coords, 
                                      x = "x_", y = "y_", linestring_id = idname)

  ## drop vertex columns from silicate output now that we have used them
  segs[[".vx0"]] <- NULL
  segs[[".vx1"]] <- NULL
  ## if any object attributes remaining stick them onto the sf 
  if (dim(segs)[2] > 0) {
    ## add columns back in 
    sflines[names(segs)] <- segs
  }
  sflines[[idname]] <- NULL
  ## restore projection metadata
  sf::st_set_crs(sflines, sf::st_crs(x))
}

sfx_explode(x)
#> Simple feature collection with 7727 features and 1 field
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: 326476.7 ymin: 7387551 xmax: 333311.8 ymax: 7396291
#> epsg (SRID):    31983
#> proj4string:    +proj=utm +zone=23 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
#> First 10 features:
#>                          geometry       highway
#> 1  LINESTRING (327269.3 739026... motorway_link
#> 2  LINESTRING (327237.4 739016... motorway_link
#> 3  LINESTRING (327222.3 739014... motorway_link
#> 4  LINESTRING (327191.8 739009... motorway_link
#> 5  LINESTRING (327155.7 739005... motorway_link
#> 6  LINESTRING (327118.6 739001... motorway_link
#> 7  LINESTRING (327103.6 738999... motorway_link
#> 8  LINESTRING (327091.3 738996... motorway_link
#> 9  LINESTRING (327085.7 738995... motorway_link
#> 10 LINESTRING (327082.4 738994... motorway_link

Created on 2019-11-15 by the reprex package (v0.3.0)

ibarraespinosa commented 4 years ago

It works really good!!!!!

ibarraespinosa commented 4 years ago

When are you going to upload it? I would like to import your function in my package eixport

mdsumner commented 4 years ago

I probably won't, not sure where I would put it - you can put it in your package if you like, happy to help

ibarraespinosa commented 4 years ago

Thanks @mdsumner

mpadge commented 4 years ago

Interesting to compare that with dodgr decomposition to unique line segments:

library(silicate)
library (dodgr)

lines <- sf::st_cast(silicate::inlandwaters, "MULTILINESTRING")
nrow (sfx_explode(lines))
#> [1] 33455

x <- sf::st_cast (silicate::inlandwaters, "MULTILINESTRING") %>%
    sf::st_cast ("LINESTRING") # probably something funny going on here?
#> Warning in st_cast.sf(., "LINESTRING"): repeating attributes for all sub-
#> geometries for which they may not be constant
xsf <- weight_streetnet (x, wt_profile = 1, id_col = "ID") %>%
    dodgr_to_sf ()
nrow (xsf)
#> [1] 11202

Created on 2019-11-15 by the reprex package (v0.3.0)

Your explode function returns three times the number of linestrings as does dodgr. I've no idea what's really happening here, but thought it might be interesting for you both to see that.

mpadge commented 4 years ago

Oh, and even more interestingly:

library(silicate)
library (dodgr)

x <- sf::st_cast (silicate::inlandwaters, "MULTILINESTRING") %>%
    sf::st_cast ("LINESTRING") # probably something funny going on here?
#> Warning in st_cast.sf(., "LINESTRING"): repeating attributes for all sub-
#> geometries for which they may not be constant
xsf <- weight_streetnet (x, wt_profile = 1, id_col = "ID") %>%
    dodgr_to_sf ()
nrow (xsf)
#> [1] 11202
xsf <- weight_streetnet (x, wt_profile = 1, id_col = "ID") %>%
    dplyr::mutate (junk = 1) %>% # merge_directed needs a dummy column to merge
    merge_directed_graph (col_names = "junk") %>%
    dodgr_to_sf ()
nrow (xsf)
#> [1] 385

Created on 2019-11-15 by the reprex package (v0.3.0)

That second code removes any directional sense from the line segments, and then there are far fewer of them. There are clearly at least lots of directional duplications happening here.

mdsumner commented 4 years ago

Yep that's true - on reflection I should have mentioned it, SC0 is a direct encoding of every instance of the edges (their indices are nested with the object fields). SC() should do unique edges (I won't claim that until I try it) .

But, normally we'd be exploding linestrings I guess, not polygon layers. It's a very neat demonstration eh, thanks for bringing this up (I'm still a bit excited about leveraging sfheaders now so we can round-trip everything).

mpadge commented 4 years ago

yeah, totally agree about the fabulous utility of sfheaders - thanks so much @dcooley!

mdsumner commented 4 years ago

Hmm, SC doesn't seem to do what it should in inlandwaters, but maybe the layer isn't squeaky clean (it should be given where I got it from). Does dodgr apply a tolerance to uniqueness in coordinates? silicate doesn't but it does remove some duplication, so

nrow(sf::st_coordinates(inlandwaters))
[1] 33644
 nrow(SC(inlandwaters)$vertex)
[1] 30835

But dodgr's clearly detecting a lot more, I get the same as distinct (it matches duplicated() now but used not to):

 dplyr::distinct(tibble::as_tibble(sf::st_coordinates(inlandwaters)[,1:2]))
[1] 30835
mdsumner commented 4 years ago

Oh fwiw my explode with SC (it's a lot simpler because we can't deal with the source's object fields in one table when there's dedupe of edges)

sc_explode <- function(x, ...) {
  ## decompose to segments (edges), will work on lines or polygons doesn't matter
  sc <- silicate::SC(x)
  edge_01 <- tibble::tibble(vertex_ = c(t(as.matrix(sc$edge[c(".vx0", ".vx1")])))) %>% 
    dplyr::inner_join(sc$vertex, "vertex_")
  edge_01$edge_ <- rep(sc$edge$edge_, each = 2L)
 sflines <- sfheaders::sf_linestring(edge_01, 
                                      x = "x_", y = "y_", linestring_id = "edge_")

  ## restore projection metadata
  sf::st_set_crs(sflines, sf::st_crs(x))
}
mpadge commented 4 years ago

Yeah, that's a point - dodgr essentially presumes vertex labelling throughout, and falls back to floating point equality otherwise, which is ... what it is, but will certainly give way more vertices than any other approach.