opentraffic / architecture

OTv1 overview
71 stars 11 forks source link

Estimate speed depending on turn direction #8

Open laurentg opened 9 years ago

laurentg commented 9 years ago

For some ways the speed profile (at least before an intersection) depends a lot on the direction taken at the following intersection. For example, a left turn being on average much slower that a right turn. Or the time profile for each direction is very different (from 8 to 9 turning left is slow, from 16-18 turning right is slow...)

A GPS trace contains implicitly this information; for a given speed segment we usually know a more complete path, that is which segment will be the next one. We could then encode this data to get better speed estimates (that's a place were OpenLR shortest path encoding may be helpful in exporting this kind of contextual data).

A simple solution could be to split each segment, one for each next direction, a bit like the "turn-edges" graph approach taken by OpenTripPlanner some time ago to solve the turn restrictions (but now deprecated). Another solution could be to encode the next turn (using some index or ID) alongside the stored data. Or maybe a more generic mechanism where the next turn is treated in a same way as vehicle type information (taxi, bike, bus, etc...).

When the amount of data is low, we could apply ("smear out") speed profiles from some turns to others. For example if a right turn to a very seldom used street do not have enough data, we should be able to fallback on the main ("go straight") speed data.

Is this feature useful? How can we detect if and where this is useful? What's the impact in term of data storage? Code complexity? Export format?

kpwebb commented 9 years ago

This is really interesting question. It's totally possible for us to encode intersection turn timings as turn edges. That's actually part of our traffic inference model but we just aren't keeping that data at present. Instead of encoding turn-edges we end up averaging the turn times into the adjacent edges to that the turn times altered in proportion to the share of trips making that turn (e.g. 50% of trips entering edge A make a left turn which adds 1 min of travel time between A and B. edge A travel time is increased by 1 min * 0.5)

That said, we shouldn't throw away this data. The turn stats are super useful for transport planning and they could be made available to journey planners. But your question about format is key. Are there examples from existing traffic data sources that we should look at? I'm assuming this is something others are already doing, correct?

laurentg commented 9 years ago

In term of export format, AFAIU OpenLR could encode this kind of data, since the format allow multiple shortest paths sharing common edges. So you could encode a shortest path with edge A->B and another with edge A->C, with a distinct turn cost embedded in the two (A-B = A + AB turn + B; AC = A + AC turn + C). But that does not solve the question of internal encoding. Also, properly infering from the data you have at hand the optimal / minimal OpenLR representation is probably not trivial.

As to known how it's done by others, no clues...

Holly-Transport commented 9 years ago

Yes, these data may be useful for the area traffic control systems optimization work we are doing, where one of our tasks is to infer proportion of observations that turn left/right/etc. at each signalized junction...

abyrd commented 9 years ago

To carry over a comment at https://github.com/opentraffic/traffic-data-exchange-format/issues/4, at the source in the traffic engine all we really have is edge-to-edge timings that happen to include unknown turn timings (unless GPS/location sampling frequency is high enough to catch a vehicle sitting still, which is a special case we can't count on).

There are two separate questions here: how to store the segment speeds vs. turn waits, and how to distinguish those two things when processing the incoming stream of GPS points.

A third meta-question is whether speed and turn waits should in fact be separated out: our basic unit of data is not edge speeds, but travel times between successive location observations on adjacent edges (this can be seen in how the current traffic engine implementation functions). There will always be guesswork involved to separate that time out into segment speeds vs. turn waits, and we'll be trading in the original "purer" data for inferred data with many assumptions baked in. In areas with a lot of speed reports, we might obtain significantly better travel time estimates for a given path by only including/taking into account reports for the edge pairs along the path of interest. Of course retaining all that original edge-pair data is problematic in that it takes up additional space and will require a lot more computation because it's not directly usable (it doesn't describe full edges, it starts and ends partway along edges).

So assuming that we need and want to separate segment speeds out from turn waits, we need a method to do so. My sense is that we cannot reliably separate them out within a single observation or a small number of observations, that they can only be separated out in combination with a large number of other observations using the same segment pairs and making various turns (left, right, straight). This is another place where a more sophisticated statistical model could be helpful. By building a model that contains separate latent variables for left, right, and straight intersection traversals and feeding it enough inputs we'd actually get estimates for those variables independently of segment speed estimates.

This raises the question of whether a single instance of the traffic engine (run by a single contributor to the traffic project) should output only speed and turn information to the outside world, or if it should output more basic information (segment pair timings or successive pairs of GPS location reports) to allow better statistical modeling by merging its raw data with that from all other contributors in the same region. This has privacy implications since someone could conceivably string these pairs back together if they are not perturbed or segmented somehow.

One possibility (purely speculative and perhaps not very realistic) is having the engines send subsets of position-pair data to one another as peers, then having each engine compute turn+segment timings for a subset of intersections that are non-adjacent thus preventing anyone from having enough information to chain observations together and reconstruct paths.