opentripplanner / OpenTripPlanner

An open source multi-modal trip planner
http://www.opentripplanner.org
Other
2.19k stars 1.03k forks source link

Change in OTP routing behaviour in a national coverage network. 100% CPU spike #1942

Closed andreyz closed 9 years ago

andreyz commented 9 years ago

We've noticed radical change in routing behaviour in 0.17.0 release. Our setup involves three OSM extracts that cover large metropolitan areas and one national level GTFS.

Previously OTP would still correctly route to areas missing OSM coverage, with snapping to GTFS stops and would perform that query fairly swiftly.

In 0.17.0 release when requested to plan a route somewhere outside OSM extract coverage, OTP CPU usage spikes up to full 100% for several seconds (up to 10-30sec) and then returning a result that routes to a random location inside one of the OSM covered areas, often missing the original destination by half a country. In one such case CPU went to 100% for several minutes and I had to kill the process (couldn't reproduce that again).

Could it be @buma's change to stop linking in #1934 ?

abyrd commented 9 years ago

Thanks for reporting this. The stop linking changes were made by @mattwigway and he may have some insight on the cause. I was actually unaware that OTP would work well on a mixed OSM / no-OSM graph. I suppose we should keep this functionality if it is advantageous. Is there a reason you don't just load OSM for the entire coverage area? Memory consumption I suppose.

abyrd commented 9 years ago

It is likely that the CPU usage spikes are OTP searching exhaustively through your entire network and never reaching the destination, and we should certainly catch situations that could lead to this useless and resource-eating process.

andreyz commented 9 years ago

@abyrd thanks for responding! Yes, memory consumption is the main concern. Is there any rule of thumb on how to estimate memory requirements in this case apart from just building a graph and running it?

andreyz commented 9 years ago

Why having mixed OSM / no-OSM graph? We've assumed that having detailed walking directions is really useful in a complex urban areas and otherwise it's quite okay without them in sparsely populated parts of the country, where we could avoid including OSM data.

andreyz commented 9 years ago

I'd gladly collaborate with @mattwigway and help to keep previous behaviour with mixed graph.

abyrd commented 9 years ago

Unfortunately I don't have any model for memory consumption as a function of OSM size. The street density in populated areas is much higher than in the countryside though, so often including the entire country is not much bigger than only the major cities. I see why you have an interest in snapping to stops without using roads, but the ideal seems to be using OSM data everywhere.

andreyz commented 9 years ago

Will do a test with full OSM coverage to see the effects both on memory and performance.

andreyz commented 9 years ago

Ping @mattwigway . We've experienced so far two cases of 100% CPU load by OTP for undefined periods of time (had to kill the java process after several minutes) even on the plan queries within OSM extract boundaries.

We're running 0.17.0 release. It seems very unstable for us, which wasn't the case with earlier releases.

mattwigway commented 9 years ago

@abyrd I wonder if the pareto dominance function is still being used somewhere? This sounds like the kind of thing it would do, and it could be exacerbated by the new code as we now link to all ways that are closest or within some epsilon, so there can be >2 street transit links if there are duplicate ways (e.g. with duplicated OSM data).

    --        Matthew Wigginton Conway       Transportation Analytics/Open SourceWashington, DCindicatrix.org
     ---- On Sun, 17 May 2015 06:09:18 -0400  notifications@github.com  wrote ----Ping @mattwigway . We've experienced so far two cases of 100% CPU load by OTP for undefined periods of time (had to kill the java process after several minutes) even on the plan queries within OSM extract boundaries.

We're running 0.17.0 release. It seems very unstable for us, which wasn't the case with earlier releases. —Reply to this email directly or view it on GitHub.

JordenVerwer commented 9 years ago

This kind of behavior would also imply a failure of the timeout code to abort the search, so the (presumably) infinite loop is probably located inside the loop in which the timeout conditions are checked.

andreyz commented 9 years ago

Graph is built with following GTFS and 3 OSM extracts 1, 2, 3. To trigger CPU spike, try routing from (59.33459, 18.06152) to (56.66302, 16.36620).

hannesj commented 9 years ago

I think this might be related to #1932, as that is also happening on the new linker code and has very similar effect.

andreyz commented 9 years ago

@laurentg and @mattwigway any update on the issue? Do you need more test case data to reproduce?

laurentg commented 9 years ago

Please see PR #1954, your symptoms are exactly matching what this PR is aimed to solve. I've not tested it, but I'm fairly confident this is the issue.

laurentg commented 9 years ago

As reported by @johannilsson this is caused by the same issue as #1932 and should be solved by #1954. Closing issue.