Link to flows package, osmprob and sDNA

ropensci / stplanr

Sustainable transport planning with R

https://docs.ropensci.org/stplanr

Other

417 stars 66 forks source link

Link to flows package, osmprob and sDNA #145

Closed Robinlovelace closed 4 years ago

Robinlovelace commented 7 years ago

Could be in documentation or some kind of interface depending on a number of factors.

Link to:

[x] the flows R package. Described here: https://cran.r-project.org/web/packages/flows/vignettes/flows.html
[ ] 'osmprob' R package.
[x] sDNA: http://www.cardiff.ac.uk/sdna/

mpadge commented 7 years ago

Robin and Richard, This issue seems like an appropriate place to raise the following considerations, which are in no way intended to discredit the eminent advances offered by the flows package. This package will nevertheless very likely soon be obsolete with algorithmic advances in routing algorithms that generate probability densities along any and all possible paths connecting two points. @karpfen and I are developing an R package for ''probabilistic routing''. It's currently preliminary, but aiming for polished first cut before mid-year. This package should largely obviate the need for constant calls to routing APIs. User workflow will be something like:

Use osmdata to download highway network for desired area and import into R
Submit a bunch of routing points to routing algorithm
Wait hopefully only a very short while and receive, for N points, an object containing a bunch (~<=N(N-1)/2, depending on directionality and network redundancy) fully weighted versions of the network where weights represent flow densities between each pair of points. A test run i did some time ago took about 10 minutes for routing between >1,000 points across NYC and greater London.
Convert these to desired format or structure. The single 'best' (shortest/fastest/whatever) route is implicit in the output structure.

I wanted to raise this with the two of you because you're definitely best placed to provide insight into typical use cases, particularly into the structurally constraining assumption that most users will typically be interested in one particular area, and thus pre-downloading will generally be acceptable.

Other thing to note is in regard to this PR of @richardellison - the 'usual' approach to which, as followed by our code, is to identify connected components in a network and to allow forced merging of components which can then be done by simply connecting nearest nodes.

I'd imagine that it would ultimately be better for stplanr to use such a fully probabilistic routing algorithm rather than flows, and that incorporating our package would obviate many of your current uses of API calls for routing.

Robinlovelace commented 7 years ago

The workflow you propose sounds good for city-level analysis of routes - definitely see the advantage of downloading the data and doing it locally for cities and it sounds like the approach is scalable. It would be great if stplanr could integrate with this in some ways, e.g. via the SpatialLinesNetwork class or by providing ways of visualising the route network (I'm thinking with https://github.com/ropensci/stplanr/issues/158 - an intemediary step before rasterising or webGLing the results - see here for a prototype of what this could look like for a city).

I think there are pros of cons of different approaches, the relative balance of which will depend on the application. What I like about the API-based approach is its flexibility (e.g. allowing up-to-date including traffic conditions, price of tickets with Google's Distance Matrix API, time of day and updates to external routing algorithms, as has happened a number of times with CycleStreets) and that it allows potential computationally demanding processes to be outsourced to servers designed for handling many requests for anywhere day in, day out.

I don't see there being much overlap between flows and osmprob in any case. flows seems to focus on analysis of OD data, ignoring the route network. osmprob seems to focus on geographical route network data and is more sophisticated. Will be interested to see how it advances and how we can link to it (link being open to interpretation - simply a link in documentation could count as a 'link' in this issue but it would be great to have a proper interface or import).

An interesting thing about those probabilities per route is that they are highly dependent on 2 sets of factors, those that can change and those that are fixed or which we have no influence over:

Fixed factors: distance, hilliness, bendiness
Changeable factors: road/pathway conditions, speed limits, driver behavioiur, traffic lights, laws

Disentangling those will be interesting and potentially highly policy-relevant challenge. Another complicating factor is that route options can be created and destroyed. This complexity was one of the reasons we focussed attention on only the fastest route in the PCT: evidence suggests that people have a strong preference for direct routes so that's where we want to direct attention for the construction of new paths. Having use cases in mind is vital and that's indeed something we can help with. I suspect your approach will be useful at helping to estimate the uptake of cycling/walking follow the construction of new cycle paths, like the new Leeds-Bradford 'cycle highway' that the DfT has just spent ~£30 million on (see for example this article).

Another link-up: Crispin Cooper, @fiftysevendegreesofrad, has mentioned the possibility of open-sourcing some of his excellent (at least partly C++) sDNA software for route network analysis - see a new paper on it for more: http://www.sciencedirect.com/science/article/pii/S0966692316307256

Please let us know of any updates on this Crispin - suspect this discussion will be of interest to you!

So quite a narrow issue has become an open-ended discussion! Good to lay out all options before path dependency and technical debt kick in.

richardellison commented 7 years ago

Probabilistic methods for network assignment are something I'm interested in and it was something I was planning on working on this year because of a project I'm involved in. The initial reasoning behind adding the SpatialLinesNetwork class to stplanr was to implement something like this but I didn't quite get around to it last year so it is good to see you have already started on it. At the moment, a lot of the available methods are closed-source or have not been implemented (or integrated) with R so getting something working would be very useful. I'll be happy to contribute if I can.

I do have a few observations/comments:

It would be good if the method could be generalised so as not to rely on OSM. Many people use their own networks, either because the quality of OSM is lacking in some way or because they are testing future infrastructure options (this is primarily what I work on). Having had a quick look at your code it doesn't appear that it uses anything from OSM specifically and a Spatial* (or sf?) class as input is used which would be best.
The probability of any single route being chosen is highly dependent on the travel times on each part of the network which is in turn affected by how many vehicles are using that route. This means that an accurate probabilistic method generally requires calculating the probabilities more than once depending on how frequently you adjust/recalculate travel times and results are dependent on how network loading is done.
There have been a range of network assignment methods used in the past in most transport research of which static assignment is the most common. One of the main differences between these methods is how they handle travel times and queueing (traffic jams and congestion in other words). If a particular part of the network is over capacity then this has a flow-on effect on other parts of the network and as a consequence the probability of any specific route being chosen. Static assignment handles these in a very basic manner and (at least in research) have been superseded by dynamic and "quasi-dynamic" methods that more accurately account for these spillback effects. Any method that does not somehow account for these seems to end up with unrealistic travel times and routes.
There are some differences between routing of vehicles on the road network and routing of pedestrians and public transport users (and to a slightly lesser degree cyclists). Are you planning on allow for public transport routes as well?
Some method of parallelising at least parts of the function can considerably reduce computation times which is particularly important if you're running the process repeatedly. Is your 10 minutes estimate for a single core?

I agree with Robin that the availability of a class like this does not remove the need for interacting with online routing APIs. They are both useful in their own ways.

mpadge commented 7 years ago

Great to get such constructive feedback so quickly! Some quick thoughts in response, ordered according to scheme of @richardellison:

(1) That's no biggie - basic networks are just data.frames, with from, to, weight columns, immediately convertible to/from anything else (like igraphs).
(2-3 and the points of @Robinlovelace): These likely flag the biggest issue. The algorithm we're implementing accommodates dynamic changes to the routing between A and B, so an unexpected diversion via some intermediate point does not require any recomputation. While this is in one sense dynamic, it can not (at present) accommodate dynamic changes to network weights, and I completely agree that somehow facilitating such dynamic changes is likely to be very important and should be pursued.
(4) Initial implementation will be single mode, but extension to multi-modal, including PT, should in principle be relatively straight forward. It's just a routing through a multigraph.
(5) Yeah, totally. That estimate was intentionally non-parallel, but there is huge scope for parallelification in all of this.

I envision the primary advance of probabilistic routing being the ability to provide density estimates on every single segment of a network given only OD densities/probabilities as input. It will provide a complete multi-multi routing 'solution', in which individual routes are not necessarily the fundamental object of interest. (This is thus a task not well handled by routing APIs, even those that provide effective multi-multi parallel computation.) I suspect that we'll first implement the basic, static model and then start thinking about how to adapt that to 'dynamic probabilisitc routing' (or whatever we all decided to call it). While I don't think that will pose too high a danger of 'technical debt', it's good to flag that point here.

Finally, @richardellison - exciting to hear that you're interested in this. Our prob-routing project will be having a slight pause until Feb, hopefully soon after which we can discuss and strategise how we might best combine forces.

Robinlovelace commented 7 years ago

One small question of clarification to @mpadge: how are you defining 'segment' in this context? It's an interesting question in its own right, with links to https://github.com/ropensci/stplanr/issues/99 and ways of making the results of the PCT more intuitive whilst still being modular. Note the link to strava's carefully algorithm-created, and highly useful (with intuitive names and everything) 'segment': https://support.strava.com/hc/en-us/articles/216918167-What-are-Segments-

Robinlovelace commented 7 years ago

Also cc @mvl22 who may be interested in this discussion and the definition of a segment from the perspective of CycleScape and probabilistic routing from the perspective of CycleStreets.net's bike specific journey planner.

mpadge commented 7 years ago

Segment for my purposes is idempotent with a graph edge. Definitions of segments in this sense are always static. Although segment properties may change dynamically, and may in fact dynamically cease to exist, their locations are static and immutable.

The kinds of dynamic segments underlying the strava definition - and in a broader context, those underlying .gpx structure - are not really reconcilable with (my current vision of) routing problems. I would, however, be keen to be disabused of this belief!

richardellison commented 7 years ago

The issue with ignoring travel times in probabilistic routing methods is that you generally end up with an "all or nothing" solution unless you have some method of limiting capacity on each link/segment.

Do you also have an initial inclination on what the weights will be based on? I suppose they could be anything but probabilities could be based on any number of factors. In some cases probabilistic route assignment methods are run for different demographics or trip purposes (who may place different importance on different attributes). This is in addition to the issue of travel times because it means the probabilities between any two origins and destinations might not be the same for everybody. Obviously adds another layer of complexity but it is something to think about.

My understanding is that the Strava segments are not really dynamic but are based on existing user data (so they already know the routes). In this case we do not know the routes until the assignment is run. I agree that these sort of dynamic structures are not really necessary for general routing problems. Although different segments may be available at different times, we can simply adjust the weights to a very large number for any non-usable segment such that the probability of that segment being chosen is essentially zero.

February works well for me.

fiftysevendegreesofrad commented 7 years ago

Hello all. Interesting discussion. @mpadge probabilistic routing sounds great, I'm not familiar with the technique though - is it based on discrete choice models? if so how do you generate the choice set?

Re dynamic assignment and capacity modelling - is this really necessary for cycling at least in a UK context? I appreciate some cities may need it internationally but it will make your job a lot harder.

I previously did some work with sDNA where I added a random component to the generalized cost function by which agents choose routes through the network. You could also add randomization to the coefficients (for hilliness, motor traffic etc). In a pedestrian model it gave marginally better correlation with observed flows than the non-random model.

Note that capacity modelling and randomization in some sense can replace one another, as both have the effect of spreading traffic over multiple routes rather than all-or-nothing.

Open sourcing of sDNA - is going to happen by the way, but no time frame yet (waiting on a few things internally)

mpadge commented 7 years ago

@fiftysevendegreesofrad this ain't really the place for this discussion, but prob routing isn't about choice models at all, rather it works just by allocating weights to all edges of a graph and uses those weights and nothing but those weights to calculate the probabilities of every single possible route from A to B (including potentially infinite loops). When osmprob (or whatever it ends up as) is sufficiently developed, this discussion can be continued there ...

Having said that: Your comments about randomisation are very interesting. One of the things that a probabilistic routing engine does is yield 'realistic' distances between A and B, which are always greater than shortest possible distances, even if only ever so marginally.. Randomisation would have a similar effect, but in an unstructured way. I'm working with a few colleagues on comparing shortest travel distances with slightly longer 'maximal probability' distances (or whatever they'll be called) in modelling socio-economic-political processes. We expect max prob distances to perform better, and thence sound an admittedly very preliminary death knell for naive shortest paths.

Robinlovelace commented 4 years ago

Closing, links have been added. Any further suggestions welcome.