opentripplanner / OpenTripPlanner

An open source multi-modal trip planner
http://www.opentripplanner.org
Other
2.19k stars 1.03k forks source link

Performance of Master has degraded #2325

Closed clinct closed 8 years ago

clinct commented 8 years ago

We wanted at Plannerstack to update to the latest version of Master. We however found out that the performance of the latest Master has degraded quite a bit, in some cases taking 4 times longer to return a result.

From the 15 tests 9 take way longer to complete now:

Test 0.20.0: b34e4c26751ce87ae7997a7e3357fa33ac4b03a0 Master: 13763e9fc475764c7e8d86eccca7c1e010e29793
Test 3 241 372
Test 4 394 910
Test 5 994 1810
Test 6 441 1184
Test 7 137 321
Test 8 422 984
Test 10 189 761
Test 11 167 331
Test 14 253 487

These 15 tests encompass special cases within the Dutch transit network, they can be found here: https://github.com/plannerstack/mmri/blob/master/tests/static-tests.json

I didn't have the time yet to look into it any further (e.g. bisecting), but did think it was quite a big issue and was wondering whether anyone else has seen this?

kalon33 commented 8 years ago

I also feel that I run in more timeouts as before in my OTP instance, and that it is slower, but I didn't quantified it.

Apparently, that's not only a feeling... Thanks for reporting this precisely.

----- Mail original -----

De: "Jasper Hartong" notifications@github.com À: "opentripplanner/OpenTripPlanner" OpenTripPlanner@noreply.github.com Envoyé: Mardi 23 Août 2016 10:22:39 Objet: [opentripplanner/OpenTripPlanner] Performance of Master has degraded (#2325)

We wanted at Plannerstack to update to the latest version of Master. We however found out that the performance of the latest Master has degraded quite a bit, in some cases taking 4 times longer to return a result.

* Deployed version of OTP, OpenTripPlanner 0.20.0-SNAPSHOT: b34e4c2 
* Tested against the current Master branch: 13763e9 

From the 15 tests 9 take way longer to complete now: Test 0.20.0: b34e4c2 Master: 13763e9 Test 3 241 372 Test 4 394 910 Test 5 994 1810 Test 6 441 1184 Test 7 137 321 Test 8 422 984 Test 10 189 761 Test 11 167 331 Test 14 253 487

These 15 tests encompass special cases within the Dutch transit network, they can be found here: https://github.com/plannerstack/mmri/blob/master/tests/static-tests.json

I didn't have the time yet to look into it any further (e.g. bisecting), but did think it was quite a big issue and was wondering whether anyone else has seen this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or mute the thread .

abyrd commented 8 years ago

Thanks @clinct for reporting this slowdown. I am very surprised to see a 3-4x slowdown, I didn't think anything had been changed recently that would have such an effect. This definitely must be addressed before the 1.0 release.

abyrd commented 8 years ago

@clinct what branch or fork is your deployed version b34e4c26751ce87ae7997a7e3357fa33ac4b03a0 on? It looks like that branch diverged from master at 02a667ae123db723008a442d7e5f710b70a484bc, and I get no response from a git branch -r --contains b34e4c26751ce87ae7997a7e3357fa33ac4b03a0.

abyrd commented 8 years ago

It looks like a branch that only has a few commits to implement the Dutch fare service. I'll examine changes on master since 02a667ae123db723008a442d7e5f710b70a484bc.

clinct commented 8 years ago

Hi @abyrd, thanks for looking into it. It was indeed a branch with work on the Dutch Fares.

abyrd commented 8 years ago

Between 02a667ae123db723008a442d7e5f710b70a484bc and current master, the only changes I would think could have such a large impact are the changes to the origin/destination linking, and even then I don't see immediately how this could have an impact.

abyrd commented 8 years ago

Unfortunately I think we need to bisect this.

abyrd commented 8 years ago

@clinct are you sure the configuration of the two deployments is identical? Are you seeing only slowdowns or also some speed-ups (i.e. different results and timings or only worse timings?)

clinct commented 8 years ago

@abyrd Are you able to confirm the slowdown on your end as well?

abyrd commented 8 years ago

@clinct no, I don't have any instances of OTP set up to check this right now.

clinct commented 8 years ago

I only see similar or worse times, the actual results seem to be the same (based on the graph of duration of the travel times).

All I did was merging Master and re-building (which also automatically runs the tests).

abyrd commented 8 years ago

We discussed this at the OTP steering committee meeting. If we can't find the offending commit by bisection soon, we will go ahead with the 1.0 release because we see only slowdowns, not incorrect routes. Slowdown would then be addressed in 1.1.

buma commented 8 years ago

This can definitely be SimpleStreetSplitter. I did some tests on e359bc7e7b559678133139b93824014dd120d680 since it's the last commit where old linker still exists and switching between them is easy. On my simple part of washington simple car routing takes 24 ms out of this 2*8 ms is linking (origin and destination). In old linker trip takes 9 ms linking takes 2+1ms.

@clinct: You can take same commit and switch between linkers in StreetVertexIndexServiceImpl line 577 call to old linker is commented just comment line to simpleStreetSplitter and uncomment other one.

I'll check why is it so slow and maybe #2271 will speed it up.

abyrd commented 8 years ago

Hi @buma, thanks for the input. I'm guessing you mean 2+8 ms rather than 2*8 for linking, which would give us 16 msec linking + 8 msec routing for the new linker vs. 3 msec linking + 6 msec routing with the old one. But in @clinct 's tests, number 10 increased from 189 to 761 msec and number 6 increased from 441 to 1184 msec. So we're looking at 500-700 msec slowdowns and it would be strange if those were entirely attributable to the linking process.

abyrd commented 8 years ago

@buma after looking back at #2271 I now remember that the simple street splitter does not use an iteratively increasing search radius. I recently massively sped up the linking in R5 using a two stage search (first 150 meters, then the maximum radius) and we should probably do that in OTP. But in R5 we were linking hundreds of thousands of points. Here we're linking only two, so I don't see how this could cause 500-700 msec slowdown.

abyrd commented 8 years ago

Our hashgrid spatial index is not particularly efficient in the first place (see #2336). We should probably change it. As noted on #2271 we should probably re-implement the iterative search radius increase to only increase once, and to cooperate with a finer hashgrid. There are certain locations that currently will hit a full 9-cell, 1.5km x 1.5km area which could include a very large number of street geometries.

abyrd commented 8 years ago

Also relevant is issue #2170

buma commented 8 years ago

@abyrd: I meant 2*8 since both origin and destination search take 8ms. I would like to try this on clintct tests too but I don't know with which data tests are run to see if this contributes to the slowdowns. It's probably not just that.

abyrd commented 8 years ago

Actually I just heard from @clinct and in his latest run he no longer sees the difference. While the iterative link-radius increase is still a good idea, I'm going to put it off for 1.1 since it's just a moderate efficiency improvement on large graphs.

abyrd commented 8 years ago

Before we released 1.0, @clinct confirmed that the slowdown was no longer observable. Apparently the problem was caused by something else.