moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.3k stars 607 forks source link

Router Mesh Session Stickiness #1077

Open adamjk-dev opened 8 years ago

adamjk-dev commented 8 years ago

I asked a few employees at dockercon16 about session stickiness in the built-in router mesh that comes with docker 1.12. Are there any plans to provide capabilities for session stickiness (cookie-based etc.) in this router mesh? As awesome as it would be, not all apps are stateless, and we need to route users to the proper container in certain cases.

I need to dig more into alternatives, as I heard mention of Interlock and nginx etc., but our use case would be to somehow detect the "events" for all containers to get IP addresses and ports, and update a pool in F5 devices.

But, it might be nice if session stickiness was provided by the routing mesh, and we could just send traffic to the mesh to be handled.

aaronlehmann commented 8 years ago

ping @mrjana

dperny commented 8 years ago

This is a pretty highly requested feature. I had a great number of people ask me about it at DockerCon.

mrjana commented 8 years ago

Although no promises can be made for 1.12 release I do agree that this would be a pretty useful knob to add for a whole set of applications.

But since we do load balancing at L3/L4 it cannot be bases on things like session cookie. The best that can be done is to have Source IP based stickiness. Would that satisfy your use case @adamjk-dev?

dongluochen commented 8 years ago

I think source IP stickiness is a good solution here.

adamjk-dev commented 8 years ago

That wouldn't work for our case. We would have an upstream load balancer (F5) which would make traffic appear to come from a single IP, the "SNAT pool" IP on the F5 since it is a full proxy. Effectively, Source IP based stickiness would cause all requests to go to one container since all the source IPs would come from the same address.

stevvooe commented 8 years ago

@adamjk-dev I think we actually spoke.

The main issue with adding "session stickyness" is that there are a hundred ways to do it. It is also an L7 feature, whereas our loadbalancing operates at L3/4.

There are two high-level paths here:

  1. Monitor events coming from the docker API to modify F5 state to route directly task slots.
  2. Integrate with libnetwork and have the loadbalancer operate as an L7 LB would if it were running directly in the swarm.

@mrjana Probably has some commentary here.

adamjk-dev commented 8 years ago

@stevvooe Yes, we sure did.

I just thought I would create an issue to open the discussion. Yes, for our use cases we would need L7 visibility.

So:

  1. Would this entail polling the API for events (endpoint) at regular intervals? In a current product we use, we have the ability to write "extensions/plugins" that receive hook points from the underlying allocator. In essence, we have the ability to write a plugin to receive allocation map changes (when a container starts up, moves, gets removed, etc.) in another PaaS product. It would be awesome to have a similar means of getting this information in Docker (i.e. a way for us to get called when this information updates, rather than us poll the API every 5 seconds to check or something like that).
  2. I am trying to ingest loads of information at this point, but I do recall a few talks about Interlock and GE mentioned using this with nginx as a solution (https://github.com/ehazlett/interlock). I am not sure if there are any other alternatives to this at this point (I know GE said they were comparing Interlock and Nginx to something else, but don't recall exactly what).
stevvooe commented 8 years ago

@adamjk-dev Docker has an events API, which we will extend for use with services. You may have to write a shim to interface with load balancer, but no polling would be involved. Interlock has some infrastructure to make this easier, especially when working with haproxy/nginx. Swarmkit support is not fully fleshed out and there might be a better way.

It sounds like the requirements are the following:

  1. Route requests to running container based on L7 application data.
  2. Update router to reflect container location and liveness.
  3. Have externally managed router operate directly on swarmkit networks.

A few questions:

  1. If a container goes down, how is session migration handled?
  2. What are the consistency requirements for container location?
adamjk-dev commented 8 years ago

Yeah, I fully expect to have to transform/shim the data I get from the events API for my needs and to make calls to an F5 REST API or something in our case. I just wasn't sure how that interface worked to get notifications from the events API (rather than just calling it regularly).

Requirements:

  1. Effectively, for our apps, we would need L7 inspection for proper routing, which can be done in an external F5 device or something. I just figured if the routing mesh handled it, then high availability would be handled for us etc. I imagine we will always have an F5 doing routing to the swarm for us per application and L7 inspection.
  2. Yes, we typically just update pool members as containers come up, go down, move, etc. In some cases we may also create new pools on an F5 device as well. This is just what we do today, it doesn't mean things can't change in the future.
  3. Yes, we would typically have an Apache layer configured for HA, which hits an F5 layers configured for HA, then I presume the F5 would just have a VIP that points to a pool which contains the containers for a given application.

Questions:

  1. In our current PaaS configuration, I believe it is just lost to be honest. I would have to ask some coworkers on this. But, I think if a container goes down in our PaaS, active connections are severed and they would have to route in from the top-down again. I could be wrong on this one.
  2. What do you mean? I don't really care where the containers run as long as I can route to them and we successfully update the pool members accordingly with the container IP Addresses and ports. If they move, I would expect to get an event for this and update the pool members accordingly. I presume we would split swarms based on software development lifecycle (SDLC) environments. Or, very basically, have some dev swarm(s), pre-prod, and prod swarms. I might not have caught the essence of your question though.

Really, I just need a nice way to get notified of events (container up, down, etc.) so we can update external F5 devices with that information so we always route to the right place and have L7 inspection.

stevvooe commented 8 years ago

@adamjk-dev Thanks for the clarification. I am not sure 1.12 will quite be able to handle this use case. I'll leave that determination to @mrjana.

What do you mean? I don't really care where the containers run as long as I can route to them and we successfully update the pool members accordingly with the container IP Addresses and ports. If they move, I would expect to get an event for this and update the pool members accordingly. I presume we would split swarms based on software development lifecycle (SDLC) environments. Or, very basically, have some dev swarm(s), pre-prod, and prod swarms. I might not have caught the essence of your question though.

If a container goes down, there may be a slight time delay between the point when the container goes down and the external balancer is notified. If you have the requirement to always route to a running container, things can become quite complicated, requiring connection state sync and other measures. Application-level mitigation generally handles this in most systems, but certain designs require this reliability at the connection level.

Most systems don't require this level of guarantee, as it costs performance and complexity.

For the most part, it sounds like we need an events API and container-bound routes for this use case.

adamjk-dev commented 8 years ago

@stevvooe Sure thing. I kind of figured, just wanted to open the discussion here, as I mentioned.

Yeah, I mean in a perfect world that would all work, right? But, I know in our PaaS solution today, we have potential for a "container" to move and a slight delay between updating the F5 pool members. But, I believe this is typically taken care of by health checks on the pool in the F5 device (external load balancer). So, we have "application-specific" health checks in the dream scenario, and if something wasn't quite up to date, ideally the health check would fail and that pool member would not be passed traffic. So, I think that is something that can be left to customers on their own (managing that gap in time for containers moving, going up/down, etc.).

Indeed, I really would just need a way to be notified (push to me, rather than me pull/poll for info) on when containers come up, go down, or move (which is a down followed by an up typically), and get their IP addresses and ports so the external load balancer can be updated with the appropriate pool members per say.

stevvooe commented 8 years ago

@adamjk-dev Great! Sounds like we're on the same page.

Seems like #491 plus some changes to allow direct container routing are sufficient here.

adamjk-dev commented 8 years ago

@stevvooe Sounds like it. I would have to know more, but the per-type watches sounded similar to what I was suggesting. Any way we can get hook points into actions/allocations or get notified of events would be sufficient (letting us provide some mechanism to get called when this information updates).

Apologies, but I am not sure what you mean about changes to allow direct container routing. Does that mean, as it stands today, I can't point to 2 of my containers in the swarm by IP address and port and route to them myself?

stevvooe commented 8 years ago

Does that mean, as it stands today, I can't point to 2 of my containers in the swarm by IP address and port and route to them myself?

This is a question for @mrjana, but, as far as I understand, these containers are only available on an overlay network. Each container has its own IP on the overlay, but this overlay may not be accessible directly from the outside.

We'd need to either expose these containers on the host networks or have a method of routing directly to these IPs.

adamjk-dev commented 8 years ago

Ah, gotcha. Yeah, I would be very interested to hear the answer on that one. We would definitely need a way to route directly to containers within an overlay network or wherever they reside in a swarm.

mrjana commented 8 years ago

@adamjk-dev For us to route the requests directly to the container you need to connect your load balancer to the overlay network to which the container belongs. If that is possible then it is possible.

aaronlehmann commented 8 years ago

@mrjana: I think we should support a way to have containers expose ports on the host (like docker run -p ...) rather than force all incoming connections to go through the routing mesh.

adamjk-dev commented 8 years ago

Yeah, I mean suppose I want to bypass the router mesh, since it does RR load-balancing and I may have stateful apps. I need a way to direct a user with a session to the same container as they went to before.

mrjana commented 8 years ago

@adamjk-dev There is probably a way to provide session stickyness at the HTTP level in the routing mesh which I am going to experiment with. Probably not going to happen for 1.12 but may be something can be done natively to support this for the next release.

@aaronlehmann I am not against the idea of exposing ports on the host but it had to be be only dynamic ports and that too only on the nodes where the tasks are running. Which probably makes the external load balancer configuration very dynamic. Still we could do this as a configuration option but only with dynamic host ports.

adamjk-dev commented 8 years ago

@mrjana I like it, interested to see where you can take the session stickiness. Yes, I would love to be able to dynamically update my own external load balancer with locations of how I can direct traffic to containers on whatever hosts they are running on in a swarm. What you guys have built-in is cool, but doesn't meet everyone's needs in my opinion. Giving us the power to do whatever we want in front of the swarm for routing makes it usable by anyone. As I said, if I have a way to get that information (about which container IP and port is for which "task") then I can do what I want with that information, like update an external load balancer's pool members. It would also be awesome if we had a way to get notified of any changes to this information (like hook points into the events API, for container_start, container_stop, etc. so we can act on startup, shutdown, and scaling events of a service).

aluzzardi commented 8 years ago

@adamjk-dev @mrjana There's also another way to do this that wouldn't require the events API and can be done today.

The pattern would look like

F5 -> [your own custom routing mesh] -> Tasks

Creating your own custom routing mesh is actually pretty simple. You would need to just use haproxy, nginx, or whichever session stickiness solution you want.

Then, from your own routing mesh to the tasks, rather than using VIP load balancing you could enable DNSRR. Basically when you DNS query for my-service from your container you would get a list of IPs for all tasks.

And that's pretty much it!

Long story short: you can build your own routing mesh by just having a small program periodically updating nginx/haproxy configuration file.

adamjk-dev commented 8 years ago

@aluzzardi Yeah, that is effectively what we do in our current PaaS offering (not Docker) to update F5 pool members. The product we use has a built-in mechanism to give us hook points to what sounds like an equivalent of "events" (so we know when containers start, stop, move, etc.). We just wrote a "shim" to be invoked on certain available hook points (container_start, container_stop, etc.), and when we get this information, we update the proper external load balancer with the appropriate members (container IP and port).

What I was hoping for, was an easy, convenient way with Docker to get this information. We could technically "poll" the events API to watch for new information, but I dislike polling needlessly. It would be awesome if we could subscribe to various events and get invoked as they happen. So, we could just write a piece of code that implements some hook points and bypasses the internal routing mesh. Small difference, but it would be much more efficient for us to be invoked as information updates in the swarm, rather than us polling the events API.

chiradeep commented 7 years ago

The discussion so far has been regarding the Ingress to the swarm cluster. What if I want to do something fairly sophisticated (L7) between containers (intra-cluster) as well. Is the IPVS LB going to be "batteries included but replaceable"?

stevvooe commented 7 years ago

@chiradeep This discussion is about Ingress load balancer integration at L7 external to the cluster.

If you want to handle all aspects of load balancing and not use IPVS, you can disable it by running services in DNSRR mode. You can run any load balancer inside of swarm to do load balancing, bypassing the service VIP and populate backends with the DNSRR entries.

Let's keep this conversation focused, so if you have further questions, please open another issue.

alexanderkjeldaas commented 7 years ago

I don't see the problem here.

Sticky sessions could be done at the level after IPVS. So if you have a set of HAProxy or nginx after the IPVS, you can still have your sticky sessions, while letting HAProxy as well as nginx float in the swarm.

stevvooe commented 7 years ago

@alexanderkjeldaas Indeed, that is a viable solution.

I think the main ask here is to bypass IPVS and route to containers directly, which may be required for some applications.

m4r10k commented 7 years ago

Yes, we are currently facing the same "problem". Sometimes it is required that a external loadbalancer must balance the backends which are running in Swarm. If you have five hosts and ten container you can only connect to the published service port of the hosts. These ports are routing mesh and without stickiness stateful sessions (tcp (l4) or http (l7)) will fail. Therefore a direct option for stickiness would be helpful. I know I can start a loadbalancing container within the same overlay network and let this one loadbalance the traffic from the external loabbalancer, but it is not as straight forward as it could be.

friism commented 7 years ago

@kleinsasserm since you already have an external loadbalancer, another option is to run the service with ports published directly on the individual hosts:

docker service create --mode global --publish mode=host,target=80,published=80 your-container

(--mode global ensures that tasks for the service are published on all nodes in the swarm)

m4r10k commented 7 years ago

Mode global is pretty clear, i used it in that way. Thanks to your answer, i was able to find the source of mode=host -> https://docs.docker.com/engine/swarm/services/#publish-ports. Its a little bit hidden. But as the docs says, it is only useful with --mode global. The benefit of swarm in my opinion ist, that I can run a service with 1 to n replicas and I've not to care about where, on which host, it is published. Therefore a combination of routing mesh with TCP stickiness would be a nice feature. But yes, I am sure, with mode=host in the publish option there is a valid workaround.

stevvooe commented 7 years ago

Its a little bit hidden. But as the docs says, it is only useful with --mode global.

This is not true. The limitation is that you can only run 1 replica per host that uses a specific port. If you don't specify a publish port, this limitation is lifted, as an ephemeral port will be allocated. The in use ports can be introspected via the API.

The documentation is correct:

Note: If you publish a service’s ports directly on the swarm node using mode=host and also set published= this creates an implicit limitation that you can only run one task for that service on a given swarm node. In addition, if you use mode=host and you do not use the --mode=global flag on docker service create, it will be difficult to know which nodes are running the service in order to route work to them.

Although, this portion is a little incorrect:

it will be difficult to know which nodes are running the service in order to route work to them.

The ports in use will be available via the API and can be added to the load balancer in that manner.

feketegy commented 6 years ago

So what's the status on sticky sessions?

JohanSpannare commented 6 years ago

Tried HTTP routing mesh with sticky session yesterday. And it still dosent work. HAproxy registers for http. But not for SNI https. But it dont work in either case.

So when can i expect it to work?