osm-router / osmprob

R package for probabilistic routing
https://osm-router.github.io/osmprob/
Other
7 stars 1 forks source link

e1071 vs igraph #34

Closed mpadge closed 7 years ago

mpadge commented 7 years ago

@RobinLovelace @richardellison Just wondering whether you guys tried implementing your stplanr::sum_network_routes() function with the e1071::allShortestPaths algorithm, and whether you had any insight on relative performance vs your igraph implementation? e1071 works directly on an adjacency matrix, so I imagine it should actually be more efficient?

Robinlovelace commented 7 years ago

Not looked at it myself... Would be good to test.

mpadge commented 7 years ago

okay if richard hasn't looked at it either, then I'll test it on behalf of us all...

richardellison commented 7 years ago

I have not tried it either. Would be interesting to see what the relative speeds were.

mpadge commented 7 years ago

moot point - the e1071 approach strictly requires a full matrix which is infeasible for large networks, so igraph remains at least the best of those two. @Robinlovelace & @richardellison:I think there might nevertheless be some speeding up possible with your stplanr::sum_network_routes() fn - I'll do a PR when I get a chance.

Robinlovelace commented 7 years ago

Awesome - I also need to update my PR in osmdata and that reminds me. Will aim to find time on Sunday.

mpadge commented 7 years ago

@richardellison actually not going to do an stplanr PR at all, because igraph is actually pretty crap in general for shortest paths on really big graphs. igraph just uses simple (double-headed) queues throughout (see here with the implementations here), but SP algorithms are enormously more efficient with priority queues for all but truly dense graphs - O(E+VlogV) instead of O(V^2)! The problem is there are no such implementations in R. I was about to sketch my own priority queue implementation when I stumbled upon the ultimate collection of C++ libraries by the shortest path gurus themselves, freely available here. This would be supremely easy to convert into a self-contained "ultimate R shortest path" package. Interested?

I'm going to have to bundle this stuff in osmprob anyway, so will do most of the internal work with that. The remaining polish would really just be converting from and back to various forms of graphs used in R into a standard class to pass to the C++ routines.

Robinlovelace commented 7 years ago

The other codebase to consider here is GDAL, which has recently merged the Geographic Network Model (GNM) stuff in and that also seemingly contains the ability to do shortest path analysis: https://trac.osgeo.org/gdal/wiki/rfc48_geographical_networks_support

Credit: @rsbivand who alerted me to this.

mpadge commented 7 years ago

Thanks @Robinlovelace and @rsbivand - I didn't know about that. It could be really useful in some applications, but ... it's still (1) only a bog-standard SP algorithm that doesn't use priority queues (code currently here in function called "DijkstraShortestPathTree"), and (2) the biggest stumbler which motivates this discussion is that it can't be directly applied to dual-weighted graphs.

Short digression for the record: A dual-weighted graph (my terminology) is one for which each edge has two weights: one determining the lengths to be used to calculate paths (shortest or otherwise), and one determining the actual distances we're interested in at the end. A path algorithm thus has to be applied to one set of weights, and the final desired distances evaluated on the other. This is a very common requirement - things/people/whatever follow one path according to some set of subjective/qualitative/whatever rules, but at the end we just want to know how far they actually went in reality. Social networks based on qualitative properties in which we just want to know numbers of neighbours; street networks based on mode of transport in which we just want to know physical distance.

This doesn't seem really possible at all using gdal/gnm, and it is only indirectly possible with igraph, and then with a great deal of overhead. This still leaves me favouring a direct Re-implementation of the priority queue C++ routines. I'm also going to jump at the opportunity to create a package called pqspr.

mpadge commented 7 years ago

@Robinlovelace @richardellison The results of my latest R-package-in-one-day challenge can be seen on the pqspr README - this implementation using priority queues is TEN TIMES faster than the igraph equivalent. This has to be hugely useful both for stplanr and my own work, and it also directly enables "dual-weighted graphs", so I really do think I'll turn it into a CRAN submission asap.

richardellison commented 7 years ago

I'm definitely interested. I agree that it seems very useful and I like the idea of dual-weighted graphs as that avoids an extra step in identifying the links from the shortest path and then adding up the relevant columns along those links.

I'll have a look at the package.

mpadge commented 7 years ago

final fyi @Robinlovelace @richardellison new package shifted into more serious development and re-named dodgr - aims to aid both my work and stplanr. You'll be able to ditch all the unwieldy loopy igraph code and speed things up considerably.

Robinlovelace commented 7 years ago

Wow that would be awesome. Could allow us to remove the dep on igraph potentially - thoughts @richardellison? This could fit with big changes in the pipeline for stplanr 0.2.0 but I imagine it won't be ready by early September by when we'd like to push this to CRAN. https://github.com/ropensci/stplanr/pull/198

mpadge commented 7 years ago

Hell yeah, the basic idea with dodgr is a super quick CRAN release of the bare bones so it's immediately usable. I used the all new flipper::phrase_to_pkgs() to discover that there really do seem to be no packages at all which remotely provide any such capabilities.