hypertidy / silicate

A general form for complex data
https://hypertidy.github.io/silicate/
53 stars 4 forks source link

ARC model and graph contraction #60

Open mpadge opened 6 years ago

mpadge commented 6 years ago

Probably related to #45, but not exactly sure what you have in mind there. A graph contraction algorithm should work like this:

library (sf)
l1 <- cbind (1:10, 1:10) %>% st_linestring ()
l2 <- cbind (1:10, rep (5, 10)) %>% st_linestring ()
x <- list (l1, l2) %>% st_sfc () %>% st_sf ()
plot (x)

junk What contraction would then do is split those two linestrings at the junction to form four objects (linestrings, arcs, whatever). This is not currently what silicate::ARC does:

xa <- ARC (x)
xa$object_link_arc
# A tibble: 2 x 2
#  arc_       object_   
# <chr>      <chr>     
# 1 3206ee8a7b b5d47581e7
# 2 dd909ab306 b17ef525c1

Only 2 arcs are extracted. The dodgr equivalent:

library (dodgr) # latest dev version
x$oneway <- TRUE # simplifies graph to one direction only
names (x) [1] <- "geometry" # sf bug, fixed in latest version 0.6.1
net <- weight_streetnet (x, wt_profile = 1) %>% dodgr_contract_graph ()
net$graph
#    geom_num edge_id from_id from_lon from_lat to_id to_lon to_lat ...
# 14        2 a123474       5        5        5    19     10      5 ...
# 10        2 a123466      10        1        5     5      5      5 ...
# 1         1 a123469       1        1        1     5      5      5 ...
# 5         1 a123461       5        5        5    18     10     10 ...

That gives the four edges, along with

> net$edge_map
   edge_new edge_old
1   a123474       14
2   a123474       15
3   a123474       16
4   a123474       17
5   a123474       18
6   a123466       10
7   a123466       11
8   a123466       12
9   a123466       13
10  a123469        1
11  a123469        2
12  a123469        3
13  a123469        4
14  a123461        5
15  a123461        6
16  a123461        7
17  a123461        8
18  a123461        9

I guess your current ARC model should nevertheless stay as is, because it remains true to the data it is given. A proper graph contraction algorithm can only really be implemented in C++, but nevertheless note that,

> library (dodgr)
> rbenchmark::benchmark (
>                        net <- weight_streetnet (hampi), # a sample sf street network
>                        x <- ARC (hampi),
>                        replications = 10)
# [1]    1.000 10.265

or something like that. So C++ is actually around 10 times faster at generating equivalent results. The graph contraction of

net <- weight_streetnet (hampi) %>% dodgr_contract_graph ()

slows dodgr down to only around 3 times faster, but this is extracting all intersection points and reducing the graph to its true minimal form. Note that there is no simple tidy way to find intersections in directed graphs, because they can be shared by any number of edges. (For example. a vertex shared by four edges is only an intersection if two of those are directed to that vertex, and two are directed away; otherwise that vertex is not redundant and may not be contracted.)

The question is whether you even want silicate to extend in this direction? On the other hand, the C++ speed ups ought to be enticing ...

mdsumner commented 6 years ago

I'm going offline for a few days, but first thought is yes I do want silicate to go this way! You are right to point out there's no intersection identification in ARC (the only place that happens now is in anglr::DEL, due to Triangle).

I'd lost sight of that part, but it's come up again in terms of redundancy of vertices along long straight edges (DEL creates these Steiner points under certain conditions, like "small triangles please"). More next week!

mpadge commented 6 years ago

Enjoy offlineness! :tanabata_tree: :mountain: :smile:

mpadge commented 6 years ago

okay, i've worked all the way through sc_coord, SC, and PATH, and found absolutely nothing to improve upon. That is truly a testament to your fine, fine work! I translated lots of it to C++, and managed at most utterly marginal improvements at the expense of 20-30 lines of code replacing one. I'm also doing this to convince myself that this whole tidyverse shebang has finally reached a point of sufficient maturity to be actually quite powerful. Now I am finally up to ARC, which obviously needs work because of

  1. the above timing comparisons; and
  2. the inability of current form to do proper graph contraction.

What I now plan to do is (finally) convert what dodgr does into pure silicate form, and just crib all of the graph contraction code to generate an ARC model that is a contracted graph. I'll nevertheless leave a binary option in ARC to remain input-data-true - that is, to generate ARCs as currently done - versus contracting and decomposing to simplest, unambiguous graphs in which each ARC::arc_ only intersects other ARC::arc_ objects at terminal points. Feel free to modify, redirect, or otherwise comment on that aim in the meantime ...

mdsumner commented 6 years ago

Hey, awesome! It was great to ponder this over the weekend, I maintain that (part of) what you use the contracted graph for is ARC, but ARC is assuming that we really have a "partition of the plane" - as you say, "remains true to the data it is given". When I said ARC and cg were "the same" I did really only mean the "chain of segments with no third junction".

I see true plane-normalization as being a separate task, since my ARC will leave intersections that "shouldn't be there" - but that is very important information for some tasks. It's a bit like TRI versus anglr::DEL, TRI is feature-based and knows nothing of neighbours, while DEL is planned to (doesn't currently but should) be topologically sound for an entire layer.

This is conflated in spdep, which will apply a fudge-factor scaling to coordinates that are close but not the same, and then tell you what's touching what - we really need those tasks to be separable, since sometimes I want to know my layer is not topologically sound and where, other times I needn't care. I found the same with anglr::copy_down, if I copy raster values down I keep uniqueness in x/y, but if I copy feature constant values down then I was uniqueness in x/y/z.

I've got a notion that the need for both anglr::DEL vs. silicate::TRI is analogous to our missing dodgr-ready model vs. silicate::ARC.

Oh finally, the only reasonable implementations I know of inserting these missing vertices at edge intersections are RTriangle::triangulate and rgeos::gNode (and the sf equivalent). In spatstat you can do a reasonable job with selfcrossing but it's not really suitable. I think it's worth listing any other options that are related.

mdsumner commented 6 years ago

Another thing while we're celebrating the tidyverse, I did a more purely gather/spread based decomposition to segments here, there's a lot of awkward moments in silicate (and anglr) where this tidy way could be used:

http://rpubs.com/cyclemumner/367272

mdsumner commented 6 years ago

ARC was modelled around polygons and a fundamental problem is that it does not consider the end of a line to be a node, which I now think was a mistake.

In your example, my code rewinds the path as if it was a polygon ring and so I only see one node (where the lines share a vertex) but the link mistakenly repeats the arc sequence because it thinks a node not at index 1 within a linestring must be a closing polygon.

There's some logic that looks for vertices touched by more than 2 edges, but that needs to include any that are only touched by one - within one feature - or that are touched by two that belong to different features. I think that means ARC is totally flawed, and there's really three kinds of vertices not just two.

mdsumner commented 6 years ago

Looking at this again finally, what about this case - when an intersection (lines crossing, noding in SF-speak) is not in the input?

library (sf)
l1 <- cbind (c(1, 10), c(1, 10)) %>% st_linestring ()
l2 <- cbind(c(1, 10), 5) %>% st_linestring ()
x <- list (l1, l2) %>% st_sfc () %>% st_sf ()
## newish sf will plot invisible lines if no attributes ...
plot.sf <- function(x, ...) sp::plot(as(x, "Spatial"), ...)
plot(x)
points(silicate::sc_coord(x))

## only four vertices still
net <- weight_streetnet(x, wt_profile = 1) %>% dodgr_contract_graph ()

Does dodgr have (or aspire to having) ability to find that latent vertex at 5,5 ?

mpadge commented 6 years ago

Yes it does. There's an issue on the repo that I'll flag later about the London tube system and interchanges which are only implicit yet not proper geometric intersections. We can do this

mpadge commented 6 years ago

osmdataused to actually have that ability back in the days before I properly understood OSM and didn't realise that it was redundant. Easily reimplemented

mdsumner commented 6 years ago

We might call this issue "fix ARC"

1) ARC logic works for polygons, but not lines (assumption about closing point) 2) segments are not planar-normalized (crossings are not detected)

In the first instance I'd allow ARC to represent what was given as is (fix 1). A new feature then is "planar normalization" by providing 2).

Manifold literally calls this "Normalize Topology". It's variously called "partitioning the plane", "planar decomposition" - and it has straightforward analogs in triangle surfaces (see https://github.com/hypertidy/spacebucket for that).

Old stuff: https://github.com/hypertidy/silicate/issues/37