luukvdmeer / sfnetworks

Tidy Geospatial Networks in R
https://luukvdmeer.github.io/sfnetworks/
Other
345 stars 20 forks source link

If the grouping variable is out of order, `to_spatial_contracted` returns incorrect node geometries #243

Open MattArran opened 1 year ago

MattArran commented 1 year ago

Describe the bug

When to_spatial_contracted is used to merge clusters of nodes, with the clusters indicated by a grouping variable that is not in ascending order, the geometries of some nodes are incorrectly assigned to other nodes. This results in

  1. nodes located far from the ends of the spatial edges with which they are associated,
  2. node attributes associated with locations far from those to which they correspond.

Only nodes that are in a single-node cluster appear to be assigned incorrect geometries, though such a node may be assigned the geometry of a node in a multi-node cluster. Internally, I imagine this is a result of the automatic ordering conducted by group_by, perhaps leading to a mismatch at l115 of morphers.R between the ordering of nodes in all_group_idxs (from group_indices(...) at l113) and that required for new_node_geoms to correspond to new_nodes (from as_tbl_graph(contract(...)) at l137). The easiest fix may be to sort the tibble of node data by the grouping variable[s] at the start of the process.

Reproducible example

library(dplyr)
library(tidygraph)
library(sf)
library(sfnetworks)

p1 = st_point(c(0, 1))
p2 = st_point(c(1, 1))
p3 = st_point(c(0, 3))
p4 = st_point(c(0, 2))
l1 = st_sfc(st_linestring(c(p1, p2)))
l2 = st_sfc(st_linestring(c(p2, p3)))
l3 = st_sfc(st_linestring(c(p3, p4)))
edges = st_as_sf(c(l1, l2, l3), crs = 4326)
network <- as_sfnetwork(edges) %>%
  activate("nodes") %>%
  mutate(y_coord = c(1, 1, 3, 2))

network %>%
  convert(to_spatial_contracted, y_coord, summarise_attributes = "mean")
#> # A sfnetwork with 3 nodes and 3 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # A directed multigraph with 1 component with spatially explicit edges
#> #
#> # Node Data:     3 × 3 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 0 ymin: 1.000038 xmax: 0.5 ymax: 3
#>   y_coord .tidygraph_node_index              x
#>     <dbl> <list>                   <POINT [°]>
#> 1       1 <int [2]>             (0.5 1.000038)
#> 2       2 <int [1]>                      (0 3)
#> 3       3 <int [1]>                      (0 2)
#> #
#> # Edge Data:     3 × 4
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 0 ymin: 1 xmax: 1 ymax: 3
#>    from    to                                      x .tidygraph_edge_index
#>   <int> <int>                       <LINESTRING [°]>                 <int>
#> 1     1     1 (0.5 1.000038, 0 1, 1 1, 0.5 1.000038)                     1
#> 2     1     3               (0.5 1.000038, 1 1, 0 3)                     2
#> 3     3     2                             (0 3, 0 2)                     3
network %>%
  arrange(y_coord) %>%
  convert(to_spatial_contracted, y_coord, summarise_attributes = "mean")
#> # A sfnetwork with 3 nodes and 3 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # A directed multigraph with 1 component with spatially explicit edges
#> #
#> # Node Data:     3 × 3 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 0 ymin: 1.000038 xmax: 0.5 ymax: 3
#>   y_coord .tidygraph_node_index              x
#>     <dbl> <list>                   <POINT [°]>
#> 1       1 <int [2]>             (0.5 1.000038)
#> 2       2 <int [1]>                      (0 2)
#> 3       3 <int [1]>                      (0 3)
#> #
#> # Edge Data:     3 × 4
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 0 ymin: 1 xmax: 1 ymax: 3
#>    from    to                                      x .tidygraph_edge_index
#>   <int> <int>                       <LINESTRING [°]>                 <int>
#> 1     1     1 (0.5 1.000038, 0 1, 1 1, 0.5 1.000038)                     1
#> 2     1     3               (0.5 1.000038, 1 1, 0 3)                     2
#> 3     3     2                             (0 3, 0 2)                     3

Created on 2023-05-30 with reprex v2.0.2

Expected behavior

R Session Info

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8    LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reprex_2.0.2     tidygraph_1.2.3  sfnetworks_0.6.3 sf_1.0-9         dplyr_1.0.10    

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0   xfun_0.33          purrr_0.3.5        colorspace_2.0-3   vctrs_0.4.2        generics_0.1.3     htmltools_0.5.3   
 [8] s2_1.1.1           yaml_2.3.5         utf8_1.2.2         rlang_1.0.6        e1071_1.7-12       pillar_1.8.1       glue_1.6.2        
[15] withr_2.5.0        DBI_1.1.3          wk_0.7.1           lifecycle_1.0.3    munsell_0.5.0      gtable_0.3.1       evaluate_0.17     
[22] knitr_1.40         callr_3.7.2        fastmap_1.1.0      ps_1.7.1           class_7.3-20       fansi_1.0.3        highr_0.9         
[29] Rcpp_1.0.10        KernSmooth_2.23-20 clipr_0.8.0        scales_1.2.1       classInt_0.4-8     lwgeom_0.2-11      fs_1.5.2          
[36] ggplot2_3.3.6      digest_0.6.29      processx_3.7.0     grid_4.2.2         cli_3.4.1          tools_4.2.2        magrittr_2.0.3    
[43] proxy_0.4-27       tibble_3.1.8       crayon_1.5.2       tidyr_1.2.1        sfheaders_0.4.2    pkgconfig_2.0.3    assertthat_0.2.1  
[50] rmarkdown_2.17     rstudioapi_0.14    R6_2.5.1           units_0.8-1        igraph_1.4.3       compiler_4.2.2   

Edit: example changed to illustrate the possibility of a node's location becoming divorced from its edges', and details added to expected behaviour.

MattArran commented 1 year ago

I'll work on fixing this now.