Closed abyrd closed 4 years ago
This ticket is a priority -- we need to do some testing to determine if this is a viable way to provide good long-distance search times on large graphs (ny511).
It's easy to ban every trip in a trip sequence, but it's a little more challenging to ban a trip sequence. For example, if on the first iteration the best route uses trips AB, then on the second pass the paths B, or CB, or BC, or whatever, would all be allowed.
If the intent is to ban the sequence and not the individual trips, then a sort of complicated comparison would have to be made upon boarding. But I guess that's not so hard. Is this what we want?
Also, if you want to ban the trip or route sequence explicitly, I think it would be better to ban the route sequence than the trip sequence. First, if you ban the trip sequence in a lot of cases the second-best itinerary will be the same route a few minutes later. Second, the infrastructure is in place to ban route sequences, but not trip sequences, because the State keeps route sequences around.
If you're finding the lowest-cost solution at each retry and you've already found sequence AB, any solution with sequence ABX can only be higher-cost. A solution that has more rides and a higher cost is probably undesirable. So rather than banning a specific sequence, we'd be banning a sequence prefix (this is the reason for the existing methods that check for prefixes or subsets).
If you've found AB and eliminate that prefix, then yes XB would still be an option. That seems correct to me -- you can change one of the elements in the sequence to get a higher cost solution, but not append (or prepend?) elements.
Agreed that route sequence banning also makes sense, not just trip sequence banning.
Really, we want to move toward a two-phase exploration of travel options, first providing the user with a set of route combinations, then in a second step letting them browse the specific instances of each route combination in time. The second step is very fast to calculate and doesn't even involve searching. This is related to what we're doing in the "profile routing" TDM work (Arlington project).
And as in profile routing, maybe ideally what we want to do is ban a specific sequence of trip patterns, rather than trips or routes. But that gets messy, because there are often groups of trip patterns that have common trunks. The profile routing code cares about those details, so in the normal routing code maybe we should stick with route sequences.
Retrying with route-sequence banning is implemented in https://github.com/opentripplanner/OpenTripPlanner/tree/longdistanceretry. Here's a profile run I ran on it.
Experimental setup: I made a graph using MTA subway, MTA bus, LIRR, MetroNorth, New Jersey Bus, and New Jersey Rail pulled directly from their various servers on Nov 14, 2014. For OSM, I used an extract from https://mapzen.com/metro-extracts/.
Using OTPProfiler (commit ade402e32b38a53b4e51c3160428437540d61f51), I generated 20 endpoints in New York, and from those generated 380 routing requests. I ran all these requests against an OTP server running on my laptop with a 20G heap. I tested one commit hard-wired to run in short-range mode, and another in the experimental branch with retrying long distance mode. The mean avg_time (that is, the time elapsed per itinerary within each returned request) was 2.0607s for 'short distance mode' (01e37f846f7d26503b6328ac1a455fe731870d1f) and 2.0211s for retrying long distance mode (01e37f846f7d26503b6328ac1a455fe731870d1f).
I could share the requests and results file, but I'm not sure where's the best place to stick them. Ideas?
Discussed this with Brandon -- to get good results from retrying we need to use a much simpler heuristic that does not allow states with different route sequences to coexist at the same vertex. Additional problem: if the path service is seeking N paths by retrying, in the SPTService (GenericAStar) we should not be seeking N paths but only 1.
I made these two changes on the longdistanceretry branch, but something strange is going on. I'm seeing state carried between requests. Somehow searches are getting progressively slower on a running server. if you search between two endpoints that are close enough together to get a walk-only result, the router becomes incapable of using transit on any subsequent requests. Even with the trivial heuristic I see this behavior.
Just a note that this has been implemented and produces good results according to feedback from users. This is on its way to being the main or only way searches are conducted on transit.
The ban-and-retry approach to result variety has been superseded in OTP2 by generating a lot of Pareto-optimal paths. Closing.
The original retrying mechanism banned one trip or route at a time based on already-found paths. This was clearly wrong and we have moved toward eliminating it (in addition it was associated with walk limiting which gave it a bad rap). However, the alternative of branching a lot and attempting to find multiple paths at once during a search is very inefficient in large graphs.
Perhaps we should be retrying, but banning entire route or trip sequences in the retries. OTP is already tracking the route sequence in each state. The long distance mode uses a computation-heavy heuristic that interleaves a backward search from the destination to find lower bounds on travel time. This heuristic remains valid during retries -- it is still "optimistic" with some routes banned. So we should absolutely retain the heuristic object with some progress made between retries.