opentripplanner / OpenTripPlanner

An open source multi-modal trip planner
http://www.opentripplanner.org
Other
2.16k stars 1.02k forks source link

Remove support for multiple routers #2760

Closed t2gran closed 4 years ago

t2gran commented 5 years ago

OTP1 support loading multiple graphs (routers) and this is part of the URLs. In most deployments only one router is used (the default router).

This make the code complicated; hence increase the cost of maintenance.

In a service oriented architecture loading multiple graphs into one process is not ideal. The savings of resources like memory and CPU is minimal, while having one process lead to less control. It might be easy for manual setup and environments with fixed number of OTP instances, but restarting a server affect all graphs - even if the cause of the restart was a small change in just one graph.

THIS NEEDS APPROVAL FROM THE PLC.

Please, comment on this issue - we would like to know if this feature is important to you, so we can find find a solution or even keep the support for multiple routers.

t2gran commented 5 years ago

This issue is related to #2625 (Modular design) and #2696 (Remove Router API)

optionsome commented 5 years ago

We do use multiple routers in one OTP instance in our more experimental routing API at HSL, and we use the named routers for all other instances as well. However, the benefits from using multiple routers/named routers is not that great and we can change our system to use default router without having to change any external API endpoints.

t2gran commented 5 years ago

@optionsome Would it make a difference to you if the base URL was configurable? I guess it does not help much, since two instances on the same machine would need need to run on different network port numbers.

drewda commented 5 years ago

We do not use multiple routers at @interline-io. Instead, we wrap each OTP in a Docker container and use that to deploy multiple routers alongside each other.

We would be comfortable if OTPv2 only support a single router per process/deployment.

t2gran commented 5 years ago

Do we still need two configuration files (build- and routing-config.json)? What do we do with shared configuration? The Sandbox model introduce "feature toggling", a feature may be specific to the one of the build/routing phase, so I guess we could enable features in the build config and in the routing config. Thinking about this a second time, I lean towards keeping the two configuration files.

(This came up in a discussion between @abyrd and me today, and I wanted to make a note of it here, so we can get back to this later).

marcusyoung commented 5 years ago

I do use multiple routers on a regular basis. I, along with many others, am using OTP for academic research purposes. Please keep us in mind when considering these sort of changes. I often have different routers for a before and after situation - e.g. to compare the effect of a new transit service or to compare accessibility in different years. It is much easier to have them all running on the same OTP instance.

t2gran commented 5 years ago

I have added this to next PLC meeting agenda - for discussion and approval. The meeting is June 6th, at 8AM US Pacific / 17:00 CET. Anyone interested in participating the PLC meeting is welcome to do so.

@marcusyoung Thank you for the feedback. It is impossible for us to know how OTP is used everywhere - so this kind of feedback is importent. Over the years there is a lot of "convenient" functionality added, but unfortunately we must prioritize our resources. You are of cause welcome to attend the PLC meeting, if not, we will consider your input.

optionsome commented 5 years ago

How much extra headache does maintaining this functionality produce? Does the possibility of multiple routers has to be accounted in many places in the code or just in the part code of the code that loads the graphs + separate API endpoints?

t2gran commented 5 years ago

It affect a more than just endpoints and graph loading. It apply to documentation, updaters, configuration and any future extensions. To enable OTP to become more flexible and pluggable, we have to clean this code up - spending more time on it (weeks of development time). Keep in mind that adding a feature to a pice of software increase the complexity exponentially. This feature gives us nothing with regard to OTPs main purpose of providing trip search - a multi-router OTP can not do anything, multiple instances of OTP can not - except for maybe saving some money on MEM and CPU.

We have, and plan to improve memory consumption, CPU usage, startup time - which should reduce the need for supporting multiple routers. The improvements here is huge (CPU, startup) and smaller for MEM.

I would also like to mention that switching from OTP1 to OTP2 is NOT just a drop in replacement. You should expect to change you configuration, tune the server and travel search, possible small updates to your client(s) and TESTING.

I am really sorry for "bringing the bad news to town", but we are struggling to maintain OTP and deliver new functionality - which I would like to do, instead of doing maintenance.

abyrd commented 5 years ago

Thanks everyone for your input. My sense from talking to people deploying OTP for journey planning is that due to wide adoption of container-based infrastructure, the multi-router features are used less and less. Yet I hear regular mentions of the confusion or complexity these features can create.

I agree with everything @t2gran stated above. If this feature had no real impact on maintenance and complexity, there might be an argument for keeping it. Though multi-router functionality does not reach into the core router code, it creates a significant amount of complexity elsewhere, not just in code, but in the overall OTP community. One obvious example of this is that the multi-router system has never been fully documented and there exist no proper instructions on setting it up. When OTP was single-router we could fit a fairly complete explanation of setup in a single page; starting from the time the multi-router approach appeared and evolved, the documentation has never kept pace. Supporting and explaining both the common single-router and relatively rare multi-router layouts creates extra work and confusion. The code mechanisms handling the dynamic loading and unloading of graphs, reloading them from disk, the initial load process etc. have become quite complicated, and several of the people most familiar with OTP have observed that they were spending significant amounts of time just trying to get a basic understanding of how this system worked, and how it was intended to behave. Finally, on a more superficial level, this feature makes every URL and every API call more complex and opaque even for the bulk of users who don't need it.

To ensure the long term sustainability of this project, given the very limited number of developers working on it in a professional capacity, we need to be very careful about legibility of code, clarity of architecture, maintenance effort, and coding productivity. Multiple router functionality is almost certain to disappear from the OTP2 branch, which is expected to diverge significantly from OTP1 and concentrate entirely on multi-modal+transit trip planning applications.

@marcusyoung performing comparisons should be no less difficult with a single router OTP if you follow the same pattern used in other deployments (minus the containers). In imaginary OTP2 pseudo-commands:

otp --graph /home/me/graphs/baseline --server --port 8081 &
otp --graph /home/me/graphs/scenario1 --server --port 8082 &
otp --graph /home/me/graphs/scenario2 --server --port 8083 &

Then hit URLs:

curl http://localhost:8081/plan?...
curl http://localhost:8082/plan?...
curl http://localhost:8083/plan?...

Another point I'd like to make though is that the analysis capabilities built into OTP are essentially a prototype that was built some 6-8 years ago, and have not been maintained or used by the original authors since then. It is debatable whether they are really appropriate for research or planning work, and the resource consumption of the routing algorithms in use makes them impractical large-scale analysis. This analysis functionality is slated for removal from OTP, as those features in their final form have been factored out and/or reimplemented in other libraries.

marcusyoung commented 5 years ago

@abyrd I don't use the analysis functionality I query the OTP API directly from R for the work I do.

abyrd commented 5 years ago

@abyrd I don't use the analysis functionality I query the OTP API directly from R for the work I do.

For single point-to-point queries, should you wish to continue doing them with OTP2 the approach I outlined above should work perfectly. Basically you'd just use port numbers instead of router IDs.

The analysis functionality I was just mentioning incidentally since it sounded like there was a possibility you were using those endpoints as well.

optionsome commented 5 years ago

Thanks for the clarifications. I'm for removing this feature. Just remember to document somewhere all the features that are removed in 2.x, even if their existence was not widely known or documented in the first place.

marcusyoung commented 5 years ago

@abyrd Yes clearly it's still doable with separate OTP instances, just not as neat as launching a single instance with multiple routers ready to go. But my use case is probably niche, so fair enough.

However, my use of OTP for research/planning is certainly not niche, and I'd hazard a guess that there are more users of OTP for that than there are running production trip planners; based on the research activity I am aware of and the people using my OTP tutorial.

Could you perhaps clarify further what you mean by:

which is expected to diverge significantly from OTP1 and concentrate entirely on multi-modal+transit trip planning applications.

t2gran commented 5 years ago

@optionsome Yes, we need to document stuff we remove. We try to keep the change log up to date. I will make an issue to review what we have done so fare, and that everything is mentioned in the change log.

@marcusyoung Thanks for the explaining more about your work - it becomes easier for us to take your use cases into account.

t2gran commented 5 years ago

I added 2 bullets to the #2757, to remember to update the doc and especially the Changelog.md

abyrd commented 5 years ago

@marcusyoung wrote:

@abyrd Yes clearly it's still doable with separate OTP instances, just not as neat as launching a single instance with multiple routers ready to go. But my use case is probably niche, so fair enough.

However, my use of OTP for research/planning is certainly not niche, and I'd hazard a guess that there are more users of OTP for that than there are running production trip planners; based on the research activity I am aware of and the people using my OTP tutorial.

I originally began working on open source trip planning software in the context of my graduate research, and a decade later my primary use case is still urban planning / spatial analysis applications. So I fully understand that research and analytics is a major use case for open source routing software.

However, after early prototyping within OTP, our planning and analysis tools have been developed separately outside OTP. This has been the case for 5-6 years now. In my opinion OTP is not very suitable for computing accessibility metrics because it does not take into account fluctuations in travel time, wait time, and transfer time over the course of the day, and its throughput is too low (i.e. the routing method is too slow) to be very useful in large-scale processing.

Could you perhaps clarify further what you mean by:

which is expected to diverge significantly from OTP1 and concentrate entirely on multi-modal+transit trip planning applications.

There is a lot of prototype one-to-many analysis code within OTP which hasn't been maintained, and which we won't have the time to maintain in the future. We do not plan to re-create all this functionality in OTP2, although since OTP2 is now using very similar routing techniques to our planning and analysis systems, there is some possibility that the two will converge.

If you're just interested in running a large number of normal OTP point-to-point queries in a batch, OTP2 should be quite suitable for that, as it will be significantly faster than OTP1. Nothing would prevent you from sending requests to OTP2's APIs from whatever system you like, including research and analysis applications.

However, often this kind of analysis requires computing an NxM matrix of trips. The routing algorithms we use are able to find paths to all M different destinations at once from a single origin. Rather than computing NxM separate trips one at a time, it is immensely more efficient to just build N trees. If it is focused on entirely on passenger information, OTP2 would have no API for such batch searches.

landonreed commented 5 years ago

We use multiple routers on a regular basis so that a single server can support multiple, small routers, primarily for testing out routing for single small feeds. This is convenient for our purposes because it bypasses the need for a lot of server/devops configuration in order to provide this functionality to users who are building GTFS feeds and need to quickly understand how they would perform in a trip planner and share this with others they're collaborating with. This is also built into otp.js and would break some of the functionality in that UI library.

abyrd commented 5 years ago

@landonreed thanks for the comment. Can you explain how this works? Does the target OTP server get restarted when someone wants to try out a new feed? Or is the feed added to a continuously running OTP server? This does seem like an important use case. When people are working on a small GTFS feed it could be very helpful to instantly preview what routing on that feed looks like.

landonreed commented 5 years ago

@abyrd, sorry for the delayed response - I don't think my notifications are set up correctly. The test server is continuously running and new feeds are added to the server as new routers whenever they are requested. As you said, it is extremely useful for getting immediate results on small feeds/areas.

abyrd commented 5 years ago

Thanks @landonreed. Would it be reasonable to keep using a legacy OTP build, e.g. an upcoming 1.4 release, for this special use case even if OTP2 doesn't have multiple routers?

tuukka commented 5 years ago

Just a note: This issue (Whether there's a single OTP process or one for each graph) wouldn't matter if there was an official solution for spawning as many containers as required and automatically reverse proxying them using e.g. https://traefik.io/ Some additional work might be required to make it possible to upload new graphs at runtime though. (Actually, I'm thinking you'd add a new entry to docker-compose.yml and rerun docker-compose up.)

landonreed commented 5 years ago

@abyrd, yes, I suppose we don't anticipate using OTPv2 until/unless many of the new features we've been working on for v1.x are added to v2. Based on my limited understanding of v2 changes, I don't see that being for quite a long time (during which we could prepare for the change). Sorry to delay progress on this thread/issue -- I don't think I fully realized it was only pertinent to v2.

abyrd commented 5 years ago

@landonreed it will certainly take some time for features to be ported over to OTP2. I don't know how long, that will probably depend largely on whether any organizations assign staff to carry out that work. We're using issue labels to indicate which issues are expected to affect OTP1 or OTP2, this one is only labeled OTP2 for now.

markstos commented 5 years ago

RideAmigos uses roughly a dozen large routers and has tried both approaches. Long ago we switched to one-router-per-server. With this design, we can completely take one router down for maintenance, stop/start it and not affect other routers. Also, if one router is hogging the memory, it is clear who.

We run a single Nginx proxy in front of all the routers so from an end user's perspective they are grouped under a single domain as before.

I previously posted a note on the -users list about how we managed the cluster, but there wasn't much interest at the time: https://groups.google.com/forum/#!searchin/opentripplanner-users/Mark$20Stosberg%7Csort:date/opentripplanner-users/wdUn3VXh2so/d_cC6ImuAwAJ

t2gran commented 5 years ago

@markstos Thank you for you input. I would like to point out that the amount of memory and startup time in OTP2 has improved a lot, which will lover the HW cost of running multiple OTP instances. At Entur we are running a National wide service with all operators(agencies) with one router. We have added support for whitelisting operators, so an operator who only want to show their routes in a travel search can do that - we also support "preferred" operator(s). This allow an operator to show alternatives where their own coverage is poor. This is not fully merged into OTP2 jet, and probably needs some discussion before doing so. If you are hosting OTP for multiple competing agencies the "one-OTP-instance-pr-agency" is probably the best approach, but if they are overlapping and public, one OTP instance using filtering would give better results.

Performance update OTP1 / OTP2: We get better performance running OTP for all Nordic countries compared with what we get on OTP1 with just Norway. The Nordic countries is roughly 4 times bigger than Norway.

markstos commented 5 years ago

@t2gran RideAmigos is interested in stress-testing OTP-2 against our 16 router setup. Can we just check out the dev-2.0 branch and treat it like OTP-1 for the most part or is there more we need to know about building and testing OTP-2? We are looking at renting servers with almost a terabyte of ram dedicated to OTP, so memory savings are very interesting to us.

t2gran commented 5 years ago

@markstos It is a bit premature, but feel free to check out the dev-2.x branch. The plan is to make some kind of beta release that we will test at Entur. We have a long list of things that needs to be cleaned up, like #2757 ++ and some TODO OTP2s in the code. Our goal is to get OTP2-beta out in production at Entur - NOT fixing anything we don´t need for some limited test cases, then clean it up. One of the bigger issues that will be fixed before the OTP2 final release is the new GraphQL APIs - for the beta release we will use our proprietary Entur API - which need cleanup.

t2gran commented 4 years ago

The support for multiple routers is removed from OTP 2.