Design scalable deployment architecture

miraculixx commented 9 years ago

Expected behaviour

achieve t < 2 seconds response time (not including network latency) of a /plan query with loads of up to 100 concurrent clients
Attempt sub or near linear scalability with a factor <= 1 (i.e. adding 2x clients results in at most 2x the necessary infrastructure)
Tasks
Assess ways to achieve this goal -- is it possible at all? at what cost (CPU & memory requirements, #otp instances etc.)
Design appropriate deployment architecture (in general, using docker for easy deployment, but if that fails, propose alternative approach).
Make an estimate to create the respective docker image(s) and scripts for deployment. This shall have the ability to add/remove instances at run time (e.g. otpdeploy scale +5 to add 5 instances)
Implementation notes
- probably required to use a front-end server for load balancing, please use nginx
- the whole thing must be command-line deployable, as a start you may assume ssh-capable machines to be available. A subsequent story should allow for server creating on the fly too.
  Some references on performance
- https://groups.google.com/forum/#!topic/opentripplanner-users/oB-OwC4_RdM
- https://github.com/opentripplanner/OpenTripPlanner/wiki/JVMPerformance

godlike64 commented 9 years ago

The JVMPerformance links suggests the following java parameters:

-Xmx1500m -d32 -server

Right now in the Dockerfile we have:

ENTRYPOINT [ "java", "-Xmx6G", "-Xverify:none", "-jar", "/var/otp/otp.jar" ]

It could be possible allocating that much memory is taxing the app startup. However, this is tied in with the map size (when building the script, specifying anything lower than 6G tended to cause an out of memory exception). I will make some tests by changing the entrypoint and seeing if the /plan query we currently have responds faster.

Right now most of the time that takes deploying the docker image is spent in building the graph, and another bitin starting up the server (assuming the image is ready and there's no need for docker to perform any updates on it, i.e. we are starting from a prepared image with no graph built on it). Separating the graph building in another container and leaving it in a shared volume will greatly help, so the only issue left would be to tackle app startup (which relates to #1 ).

Which GTFS would you like me to use to perform these tests? When we built the script we used the complete GTFS but from the aforementioned issue it appears we can use a smaller one to improve startup time. However, I do not know which should be the one that ideally will be used in the end (come to think of it, if GTFS locations on the web are static, this could be parametrized into the startup/graph building script).

miraculixx commented 9 years ago

Which GTFS would you like me to use to perform these tests?

Update: Sorry, misread. So scratch that: ~~please use the same as before -- IMO startup time's don't really matter (it's the query performance that counts)~~ Use this example

if GTFS locations on the web are static

we cannot assume locations to be static: we will receive GTFS updates with changed locations all the time.

miraculixx commented 9 years ago

It could be possible allocating that much memory is taxing the app startup.

Right now we use am 8GB machine for testing, but if we can reduce that to say 2 or 4 GB that's great. Also please note that we will have much smaller GTFS files than originally planned - you may use this example as a reference. Our actual production size will be in the order of maybe max. 100 times this size. In comparison the full Swiss GTFS feed was 63MB (zipped).

Right now most of the time that takes deploying the docker image is spent in building the graph

I think we can reduce building time by using the GTFS data only, that is without map data. See the options used in my attempt to load the above GTFS file.

miraculixx commented 9 years ago

Regarding tasks, in particular 2. and 3., please consider using Shipyard. I'm guessing this will make scaling a lot easier and across multiple machines.

pablokbs commented 9 years ago

Ok. I've trying with shipyard. It looks good. The scale thingy works fine and we should test it on production. Still we are going to need a server in front of these backend servers to balance the load between them.

Where can I install it ?

miraculixx commented 9 years ago

Please see Skype message.

Still we are going to need a server in front of these backend servers to balance the load between them.

sure -- I prefer nginx as a front-end server as we use that in other installations already. We need this automated, so that basically we can install from the command line with a new server. Not sure what is the fastest/simplest to do this.

pablokbs commented 9 years ago

Tell me what you think about https://github.com/miraculixx/otp-dockerdeploy/commit/282aada7f767f3baf2b26fa967c9eb565aed7822

Also check your skype regarding shipyard

miraculixx / otp-dockerdeploy

Design scalable deployment architecture #4

Expected behaviour

Tasks

Implementation notes

Some references on performance