Open adamjk-dev opened 8 years ago
ping @mrjana
This is a pretty highly requested feature. I had a great number of people ask me about it at DockerCon.
Although no promises can be made for 1.12 release I do agree that this would be a pretty useful knob to add for a whole set of applications.
But since we do load balancing at L3/L4 it cannot be bases on things like session cookie. The best that can be done is to have Source IP based stickiness. Would that satisfy your use case @adamjk-dev?
I think source IP stickiness is a good solution here.
That wouldn't work for our case. We would have an upstream load balancer (F5) which would make traffic appear to come from a single IP, the "SNAT pool" IP on the F5 since it is a full proxy. Effectively, Source IP based stickiness would cause all requests to go to one container since all the source IPs would come from the same address.
@adamjk-dev I think we actually spoke.
The main issue with adding "session stickyness" is that there are a hundred ways to do it. It is also an L7 feature, whereas our loadbalancing operates at L3/4.
There are two high-level paths here:
@mrjana Probably has some commentary here.
@stevvooe Yes, we sure did.
I just thought I would create an issue to open the discussion. Yes, for our use cases we would need L7 visibility.
So:
@adamjk-dev Docker has an events API, which we will extend for use with services. You may have to write a shim to interface with load balancer, but no polling would be involved. Interlock has some infrastructure to make this easier, especially when working with haproxy/nginx. Swarmkit support is not fully fleshed out and there might be a better way.
It sounds like the requirements are the following:
A few questions:
Yeah, I fully expect to have to transform/shim the data I get from the events API for my needs and to make calls to an F5 REST API or something in our case. I just wasn't sure how that interface worked to get notifications from the events API (rather than just calling it regularly).
Really, I just need a nice way to get notified of events (container up, down, etc.) so we can update external F5 devices with that information so we always route to the right place and have L7 inspection.
@adamjk-dev Thanks for the clarification. I am not sure 1.12 will quite be able to handle this use case. I'll leave that determination to @mrjana.
What do you mean? I don't really care where the containers run as long as I can route to them and we successfully update the pool members accordingly with the container IP Addresses and ports. If they move, I would expect to get an event for this and update the pool members accordingly. I presume we would split swarms based on software development lifecycle (SDLC) environments. Or, very basically, have some dev swarm(s), pre-prod, and prod swarms. I might not have caught the essence of your question though.
If a container goes down, there may be a slight time delay between the point when the container goes down and the external balancer is notified. If you have the requirement to always route to a running container, things can become quite complicated, requiring connection state sync and other measures. Application-level mitigation generally handles this in most systems, but certain designs require this reliability at the connection level.
Most systems don't require this level of guarantee, as it costs performance and complexity.
For the most part, it sounds like we need an events API and container-bound routes for this use case.
@stevvooe Sure thing. I kind of figured, just wanted to open the discussion here, as I mentioned.
Yeah, I mean in a perfect world that would all work, right? But, I know in our PaaS solution today, we have potential for a "container" to move and a slight delay between updating the F5 pool members. But, I believe this is typically taken care of by health checks on the pool in the F5 device (external load balancer). So, we have "application-specific" health checks in the dream scenario, and if something wasn't quite up to date, ideally the health check would fail and that pool member would not be passed traffic. So, I think that is something that can be left to customers on their own (managing that gap in time for containers moving, going up/down, etc.).
Indeed, I really would just need a way to be notified (push to me, rather than me pull/poll for info) on when containers come up, go down, or move (which is a down followed by an up typically), and get their IP addresses and ports so the external load balancer can be updated with the appropriate pool members per say.
@adamjk-dev Great! Sounds like we're on the same page.
Seems like #491 plus some changes to allow direct container routing are sufficient here.
@stevvooe Sounds like it. I would have to know more, but the per-type watches sounded similar to what I was suggesting. Any way we can get hook points into actions/allocations or get notified of events would be sufficient (letting us provide some mechanism to get called when this information updates).
Apologies, but I am not sure what you mean about changes to allow direct container routing. Does that mean, as it stands today, I can't point to 2 of my containers in the swarm by IP address and port and route to them myself?
Does that mean, as it stands today, I can't point to 2 of my containers in the swarm by IP address and port and route to them myself?
This is a question for @mrjana, but, as far as I understand, these containers are only available on an overlay network. Each container has its own IP on the overlay, but this overlay may not be accessible directly from the outside.
We'd need to either expose these containers on the host networks or have a method of routing directly to these IPs.
Ah, gotcha. Yeah, I would be very interested to hear the answer on that one. We would definitely need a way to route directly to containers within an overlay network or wherever they reside in a swarm.
@adamjk-dev For us to route the requests directly to the container you need to connect your load balancer to the overlay network to which the container belongs. If that is possible then it is possible.
@mrjana: I think we should support a way to have containers expose ports on the host (like docker run -p ...
) rather than force all incoming connections to go through the routing mesh.
Yeah, I mean suppose I want to bypass the router mesh, since it does RR load-balancing and I may have stateful apps. I need a way to direct a user with a session to the same container as they went to before.
@adamjk-dev There is probably a way to provide session stickyness at the HTTP level in the routing mesh which I am going to experiment with. Probably not going to happen for 1.12 but may be something can be done natively to support this for the next release.
@aaronlehmann I am not against the idea of exposing ports on the host but it had to be be only dynamic ports and that too only on the nodes where the tasks are running. Which probably makes the external load balancer configuration very dynamic. Still we could do this as a configuration option but only with dynamic host ports.
@mrjana I like it, interested to see where you can take the session stickiness. Yes, I would love to be able to dynamically update my own external load balancer with locations of how I can direct traffic to containers on whatever hosts they are running on in a swarm. What you guys have built-in is cool, but doesn't meet everyone's needs in my opinion. Giving us the power to do whatever we want in front of the swarm for routing makes it usable by anyone. As I said, if I have a way to get that information (about which container IP and port is for which "task") then I can do what I want with that information, like update an external load balancer's pool members. It would also be awesome if we had a way to get notified of any changes to this information (like hook points into the events API, for container_start, container_stop, etc. so we can act on startup, shutdown, and scaling events of a service).
@adamjk-dev @mrjana There's also another way to do this that wouldn't require the events API and can be done today.
The pattern would look like
F5 -> [your own custom routing mesh] -> Tasks
Creating your own custom routing mesh is actually pretty simple. You would need to just use haproxy, nginx, or whichever session stickiness solution you want.
Then, from your own routing mesh to the tasks, rather than using VIP
load balancing you could enable DNSRR
. Basically when you DNS query for my-service
from your container you would get a list of IPs for all tasks.
And that's pretty much it!
Long story short: you can build your own routing mesh by just having a small program periodically updating nginx/haproxy configuration file.
@aluzzardi Yeah, that is effectively what we do in our current PaaS offering (not Docker) to update F5 pool members. The product we use has a built-in mechanism to give us hook points to what sounds like an equivalent of "events" (so we know when containers start, stop, move, etc.). We just wrote a "shim" to be invoked on certain available hook points (container_start, container_stop, etc.), and when we get this information, we update the proper external load balancer with the appropriate members (container IP and port).
What I was hoping for, was an easy, convenient way with Docker to get this information. We could technically "poll" the events API to watch for new information, but I dislike polling needlessly. It would be awesome if we could subscribe to various events and get invoked as they happen. So, we could just write a piece of code that implements some hook points and bypasses the internal routing mesh. Small difference, but it would be much more efficient for us to be invoked as information updates in the swarm, rather than us polling the events API.
The discussion so far has been regarding the Ingress to the swarm cluster. What if I want to do something fairly sophisticated (L7) between containers (intra-cluster) as well. Is the IPVS LB going to be "batteries included but replaceable"?
@chiradeep This discussion is about Ingress load balancer integration at L7 external to the cluster.
If you want to handle all aspects of load balancing and not use IPVS, you can disable it by running services in DNSRR
mode. You can run any load balancer inside of swarm to do load balancing, bypassing the service VIP and populate backends with the DNSRR entries.
Let's keep this conversation focused, so if you have further questions, please open another issue.
I don't see the problem here.
Sticky sessions could be done at the level after IPVS. So if you have a set of HAProxy or nginx after the IPVS, you can still have your sticky sessions, while letting HAProxy as well as nginx float in the swarm.
@alexanderkjeldaas Indeed, that is a viable solution.
I think the main ask here is to bypass IPVS and route to containers directly, which may be required for some applications.
Yes, we are currently facing the same "problem". Sometimes it is required that a external loadbalancer must balance the backends which are running in Swarm. If you have five hosts and ten container you can only connect to the published service port of the hosts. These ports are routing mesh and without stickiness stateful sessions (tcp (l4) or http (l7)) will fail. Therefore a direct option for stickiness would be helpful. I know I can start a loadbalancing container within the same overlay network and let this one loadbalance the traffic from the external loabbalancer, but it is not as straight forward as it could be.
@kleinsasserm since you already have an external loadbalancer, another option is to run the service with ports published directly on the individual hosts:
docker service create --mode global --publish mode=host,target=80,published=80 your-container
(--mode global
ensures that tasks for the service are published on all nodes in the swarm)
Mode global is pretty clear, i used it in that way. Thanks to your answer, i was able to find the source of mode=host -> https://docs.docker.com/engine/swarm/services/#publish-ports. Its a little bit hidden. But as the docs says, it is only useful with --mode global. The benefit of swarm in my opinion ist, that I can run a service with 1 to n replicas and I've not to care about where, on which host, it is published. Therefore a combination of routing mesh with TCP stickiness would be a nice feature. But yes, I am sure, with mode=host in the publish option there is a valid workaround.
Its a little bit hidden. But as the docs says, it is only useful with --mode global.
This is not true. The limitation is that you can only run 1 replica per host that uses a specific port. If you don't specify a publish port, this limitation is lifted, as an ephemeral port will be allocated. The in use ports can be introspected via the API.
The documentation is correct:
Note: If you publish a service’s ports directly on the swarm node using mode=host and also set published=
this creates an implicit limitation that you can only run one task for that service on a given swarm node. In addition, if you use mode=host and you do not use the --mode=global flag on docker service create, it will be difficult to know which nodes are running the service in order to route work to them.
Although, this portion is a little incorrect:
it will be difficult to know which nodes are running the service in order to route work to them.
The ports in use will be available via the API and can be added to the load balancer in that manner.
So what's the status on sticky sessions?
Tried HTTP routing mesh with sticky session yesterday. And it still dosent work. HAproxy registers for http. But not for SNI https. But it dont work in either case.
So when can i expect it to work?
I asked a few employees at dockercon16 about session stickiness in the built-in router mesh that comes with docker 1.12. Are there any plans to provide capabilities for session stickiness (cookie-based etc.) in this router mesh? As awesome as it would be, not all apps are stateless, and we need to route users to the proper container in certain cases.
I need to dig more into alternatives, as I heard mention of Interlock and nginx etc., but our use case would be to somehow detect the "events" for all containers to get IP addresses and ports, and update a pool in F5 devices.
But, it might be nice if session stickiness was provided by the routing mesh, and we could just send traffic to the mesh to be handled.