luukvdmeer / sfnetworks

Tidy Geospatial Networks in R
https://luukvdmeer.github.io/sfnetworks/
Other
347 stars 20 forks source link

Add 'edge_length' argument to 'as_sfnetwork()' #65

Closed Robinlovelace closed 3 years ago

Robinlovelace commented 4 years ago

As described here: https://github.com/luukvdmeer/sfnetworks/issues/6#issuecomment-648391948

luukvdmeer commented 3 years ago

In develop this is now available. The argument is called length_as_weight. It will store the length of the edges in a column named weight. In tidygraph and igraph a column with this name is automatically recognized as being the edge weights to be used.

library(sfnetworks)

as_sfnetwork(roxel, length_as_weight = TRUE)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
#> # An sfnetwork with 701 nodes and 851 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # A directed multigraph with 14 components with spatially explicit edges
#> #
#> # Node Data:     701 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>              geometry
#>           <POINT [°]>
#> 1 (7.533722 51.95556)
#> 2 (7.533461 51.95576)
#> 3 (7.532442 51.95422)
#> 4  (7.53209 51.95328)
#> 5 (7.532709 51.95209)
#> 6 (7.532869 51.95257)
#> # … with 695 more rows
#> #
#> # Edge Data:     851 x 6
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522594 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>    from    to name        type                                 geometry   weight
#>   <int> <int> <fct>       <fct>                        <LINESTRING [°]>      [m]
#> 1     1     2 Havixbecke… residen… (7.533722 51.95556, 7.533461 51.955…  28.859…
#> 2     3     4 Pienersall… seconda… (7.532442 51.95422, 7.53236 51.9537… 107.704…
#> 3     5     6 Schulte-Be… residen… (7.532709 51.95209, 7.532823 51.952…  54.362…
#> # … with 848 more rows

Created on 2020-10-29 by the reprex package (v0.3.0)

Robinlovelace commented 3 years ago

Great work @luukvdmeer will test asap and likely close :+1:

Robinlovelace commented 3 years ago

Pre-latest version I get:

> as_sfnetwork(roxel, length_as_weight = TRUE)
# An sfnetwork with 701 nodes and 851 edges
#
# CRS:  EPSG:4326 
#
# A directed multigraph with 14 components with spatially explicit edges
#
# Node Data:     701 x 1 (active)
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
             geometry
          <POINT [°]>
1 (7.533722 51.95556)
2 (7.533461 51.95576)
3 (7.532442 51.95422)
4  (7.53209 51.95328)
5 (7.532709 51.95209)
6 (7.532869 51.95257)
# … with 695 more rows
#
# Edge Data:     851 x 5
# Geometry type: LINESTRING
# Dimension:     XY
# Bounding box:  xmin: 7.522594 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
   from    to name                  type                                                         geometry
  <int> <int> <fct>                 <fct>                                                <LINESTRING [°]>
1     1     2 Havixbecker Strasse   residential                    (7.533722 51.95556, 7.533461 51.95576)
2     3     4 Pienersallee          secondary     (7.532442 51.95422, 7.53236 51.95377, 7.53209 51.95328)
3     5     6 Schulte-Bernd-Strasse residential (7.532709 51.95209, 7.532823 51.95239, 7.532869 51.95257)

Post latest version I get this so the issue is fixed for sure. Great work!

# Aim: test length_as_weight argument
remotes::install_github("luukvdmeer/sfnetworks", "develop")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'sfnetworks' from a github remote, the SHA1 (7590fd8c) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(sfnetworks)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
as_sfnetwork(roxel, length_as_weight = TRUE)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
#> # An sfnetwork with 701 nodes and 851 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # A directed multigraph with 14 components with spatially explicit edges
#> #
#> # Node Data:     701 x 1 (active)
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>              geometry
#>           <POINT [°]>
#> 1 (7.533722 51.95556)
#> 2 (7.533461 51.95576)
#> 3 (7.532442 51.95422)
#> 4  (7.53209 51.95328)
#> 5 (7.532709 51.95209)
#> 6 (7.532869 51.95257)
#> # … with 695 more rows
#> #
#> # Edge Data:     851 x 6
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522594 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>    from    to name        type                                 geometry   weight
#>   <int> <int> <fct>       <fct>                        <LINESTRING [°]>      [m]
#> 1     1     2 Havixbecke… residen… (7.533722 51.95556, 7.533461 51.955…  28.859…
#> 2     3     4 Pienersall… seconda… (7.532442 51.95422, 7.53236 51.9537… 107.704…
#> 3     5     6 Schulte-Be… residen… (7.532709 51.95209, 7.532823 51.952…  54.362…
#> # … with 848 more rows

# bigger example
network = osmextract::oe_get("isle of wight")
nrow(network)
#> [1] 44424
system.time({network_sfn = as_sfnetwork(network)})
#>    user  system elapsed 
#>   2.803   0.032   2.865
system.time({network_sfnw = as_sfnetwork(network, length_as_weight = TRUE)})
#>    user  system elapsed 
#>   3.760   0.032   3.794
head(network_sfnw %>% sfnetworks::activate("edges") %>% pull(weight))
#> Units: [m]
#> [1]  50.93481 637.29974 599.75302 383.01448 371.25535 375.71839
summary(network_sfnw %>% sfnetworks::activate("edges") %>% pull(weight))
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>       0.7      33.7      79.0     237.9     185.7 1022009.4
plot(network_sfnw[1:999, ])
#> Error in (function (cl, name, valueClass) : assignment of an object of class "units" is not valid for @'x' in an object of class "dgTMatrix"; is(value, "numeric") is not TRUE

Created on 2020-10-30 by the reprex package (v0.3.0)

Robinlovelace commented 3 years ago

Observations: strange error message when doing a silly subset - can we at least make the error message clearer? and weight column has units.

Suggestion. Use as.numeric before creating the weights: those units can cause lots of problems in my experience, e.g. when subsetting based on length.

agila5 commented 3 years ago

Observations: strange error message when doing a silly subset - can we at least make the error message clearer? and weight column has units.

I think that the problem here is that we never extended the [ operator to sfnetwork objects, so network_sfnw[1:999, ] is just calling igraph:::[.igraph. For example:

library(sfnetworks)

roxel_sfnetworks <- as_sfnetwork(roxel)
class(roxel_sfnetworks)
#> [1] "sfnetwork" "tbl_graph" "igraph"
roxel_sfnetworks[1:2, ]
#> 2 x 701 sparse Matrix of class "dgCMatrix"
#>                                                                               
#> [1,] . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> [2,] . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . .
#>                                                                               
#> [1,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> [2,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#>                                                                               
#> [1,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> [2,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#>                                                                               
#> [1,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> [2,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#>                                                                               
#> [1,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> [2,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Created on 2020-10-30 by the reprex package (v0.3.0)

The result does make sense since igraph:::[.igraph says: Query and manipulate a graph as it were an adjacency matrix. For some reasons that code fails if we add units object but I'm not familiar enough with igraph internals.

Anyway, I think that the best solution here is to keep the units on the weight field (since that's not the main problem here) and define a new method for [.sfnetwork (but this is clearly open to discussion since, probably, I'm missing something here 😅 ).

Robinlovelace commented 3 years ago

I think that the best solution here is to keep the units on the weight field

What are the reasons for doing so? For me it's a matter of +s vs -s.

-s of units m: it can cause unexpected errors when using them, e.g. for filtering the activated objects +s of units: more faithful to the sf design and provides additional attribute data

There may be other +s and -s and it's only a mild preference for not adding them based on past experience working with units objects.

luukvdmeer commented 3 years ago

I would argue that we rely on tidygraph for subsetting and all. In the end, sfnetworks should always be used with tidygraph and sf loaded as well, since it is basically a connector between those two packages. The tidygraph philosophy is to use dplyr verbs for these kind of operations. Also operators like $ do not work for the same reason.

Why not use the tidy way of plot(slice(net, 1:999))? Or of course `net %>% slice(1:999) %>% plot()

PS: I reopen the issue since we decided to only close when fixes are on master ;-)

Robinlovelace commented 3 years ago

Why not use the tidy way of plot(slice(net, 1:999))? Or of course `net %>% slice(1:999) %>% plot()

No reason, was just testing things almost at random and 'kicking the tires' of the objects ti see what happens. Here's another reprex + more observations:

remotes::install_github("luukvdmeer/sfnetworks", "develop")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'sfnetworks' from a github remote, the SHA1 (7590fd8c) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(sfnetworks)
rsfn = as_sfnetwork(roxel)
rsfn2 = as_sfnetwork(roxel, length_as_weight = TRUE)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
head(rsfn[1:99, ])
#> 6 x 701 sparse Matrix of class "dgCMatrix"
#>                                                                               
#> [1,] . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> [2,] . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . .

#> [6,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
rsfn2[1:99, ] # inconsistent
#> Error in (function (cl, name, valueClass) : assignment of an object of class "units" is not valid for @'x' in an object of class "dgTMatrix"; is(value, "numeric") is not TRUE

plot(rsfn %>% slice(1:99))
#> Error in slice(., 1:99): could not find function "slice"
plot(rsfn %>% dplyr::slice(1:99))

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
summary(rsfn2 %>% activate("edges") %>% pull(weight))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   1.956  25.479  45.656  59.369  73.376 513.913
plot(rsfn2 %>% activate("edges") %>% filter(weight > 100))
#> Error: Problem with `filter()` input `..1`.
#> ✖ both operands of the expression should be "units" objects
#> ℹ Input `..1` is `weight > 100`.
plot(rsfn2 %>% activate("edges") %>% filter(as.numeric(weight) > 100))

Created on 2020-10-30 by the reprex package (v0.3.0)

Robinlovelace commented 3 years ago

PS: I reopen the issue since we decided to only close when fixes are on master ;-)

Good point, should have remembered that as I recall now advocating that, woops!

Additional thought: the fact that the column is call weight adds support to coercing it to numeric I think. People will be more likely to expect a column called length to be of class units than the relatively generically named weight column.

agila5 commented 3 years ago

What are the reasons for doing so? For me it's a matter of +s vs -s.

I would preserve the current approach since the units are informative and you can easily drop them but, AFAIK, it's not trivial to recover. Moreover, the problem reported in the previous comment is caused by an erroneous approach, and I think you get the same results with another class that has similar properties (but, unfortunately, I don't understand the internals of igraph or Matrix)

Additional thought: the fact that the column is call weight adds support to coercing it to numeric I think. People will be more likely to expect a column called length to be of class units than the relatively generically named weight column.

You have a point there and with the filter example and I agree that it can be annoying 😅

Anyway, I wouldn't drop the units (for the reasons explained above), but it's not a big deal in both scenarios.

luukvdmeer commented 3 years ago

was just testing things almost at random and 'kicking the tires' of the objects ti see what happens

Perfect, that is exactly what is needed!

Additional thought: the fact that the column is call weight adds support to coercing it to numeric I think. People will be more likely to expect a column called length to be of class units than the relatively generically named weight column.

That is a good argument.

Remember there is also the spatial edge measure edge_length(), which can be used inside a mutate verb and has more flexibility. Could be even an option that users can provide which unit they want (might be more interested in km than m for example), and also turn them off. But adding all such arguments to the construction function is too much I think.

library(sfnetworks)
library(tidygraph)
#> 
#> Attaching package: 'tidygraph'
#> The following object is masked from 'package:stats':
#> 
#>     filter
as_sfnetwork(roxel) %>%
  activate("edges") %>%
  mutate(length = edge_length())
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
#> # An sfnetwork with 701 nodes and 851 edges
#> #
#> # CRS:  EPSG:4326 
#> #
#> # A directed multigraph with 14 components with spatially explicit edges
#> #
#> # Edge Data:     851 x 6 (active)
#> # Geometry type: LINESTRING
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522594 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>    from    to name       type                                  geometry   length
#>   <int> <int> <fct>      <fct>                         <LINESTRING [°]>      [m]
#> 1     1     2 Havixbeck… reside… (7.533722 51.95556, 7.533461 51.95576)  28.859…
#> 2     3     4 Pienersal… second… (7.532442 51.95422, 7.53236 51.95377,… 107.704…
#> 3     5     6 Schulte-B… reside… (7.532709 51.95209, 7.532823 51.95239…  54.362…
#> 4     7     8 <NA>       path    (7.540063 51.94468, 7.539696 51.94479… 155.230…
#> 5     9    10 Welsinghe… reside…  (7.537673 51.9475, 7.537614 51.94562) 208.663…
#> 6    11    12 <NA>       footway (7.543791 51.94733, 7.54369 51.94686,…  63.023…
#> # … with 845 more rows
#> #
#> # Node Data:     701 x 1
#> # Geometry type: POINT
#> # Dimension:     XY
#> # Bounding box:  xmin: 7.522622 ymin: 51.94151 xmax: 7.546705 ymax: 51.9612
#>              geometry
#>           <POINT [°]>
#> 1 (7.533722 51.95556)
#> 2 (7.533461 51.95576)
#> 3 (7.532442 51.95422)
#> # … with 698 more rows

Created on 2020-10-30 by the reprex package (v0.3.0)

luukvdmeer commented 3 years ago

This is now part of the new version. See here for an example.