nginxinc / kubernetes-ingress

NGINX and NGINX Plus Ingress Controllers for Kubernetes
https://docs.nginx.com/nginx-ingress-controller
Apache License 2.0
4.67k stars 1.97k forks source link

Enter drain mode when a pod is terminating and sticky-sessions are enabled #267

Closed dcowden closed 4 months ago

dcowden commented 6 years ago

We have a legacy application ( tomcat/java), which needs sticky sessions. When we deploy new versions of our applications, we need to stop sending new connections to a server, while sending bound sessions to the old server. Please note: this is not referring to in-flight requests, we're needing the active tomcat sessions to expire, which normally takes a few hours.

This is possible using nginx drain command. This will send bound connections to the old server, but send new ones elsewhere. But in kubernetes, calling a command on the ingress controller is not part of the deployment flow. To do it with current tools, we would need to add a preStop hook to our application. In that hook, we'd need to access the ingress controller, and ask it to drain with an api call. We'd rather not introduce the ability for applications to call apis on the ingress controller.

When kubernetes terminates a pod, it enters the TERMINATING status. In nearly all cases, when sticky sessions are enabled, the desired functionality is probably to put the associated pod into drain mode. Is this possible with the nginx-plus ingress controller?

We currently use the kubernetes-maintained nginx ingress controller. This feature would make it worth the money to use nginx-plus

Aha! Link: https://nginx.aha.io/features/IC-110

pleshakov commented 6 years ago

@dcowden Maybe the following approach can work you?

If you want to drain particular pods, you change the corresponding Ingress resource by adding an annotation that specifies which pods to drain using a label query. For example:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: cafe-ingress
  annotations:
     kubernetes.io/ingress.class: "nginx"
     nginx.com/drain: "version=0.1"
spec:
  rules:
  - host: "cafe.example.com"
    http:
      paths:
      - path: /tea
        backend:
          serviceName: tea-svc
          servicePort: 80

In this case, the Ingress controller will drain all the pods corresponding to the tea-svc with the label version=0.1. This will allow you to specify which pods to drain during an application upgrade.

Please not that this is not available, but we can add it.

dcowden commented 6 years ago

Hmm, that's an interesting approach, but I don't think it would work well for us.

Today we use fairly conventional deployments, in which the deployment controller scales pods up and down. Under the hood it does this with replicaSets i think. We do not re-publish our ingresses as a part of deployments, and this approach would require doing that.

Given our current flow, it would be much more seamless if a pod in TERMINATING status was automatically drained. This would cover several situations:

In reality, if you are running sticky sessions, i can't think of any cases where you wouldnt want to drain a pod when it is terminating rather than immediately removing it from service.

In practice there is still other work needed to make it work, because kubernetes has to have a way to know when its ok to actually kill the pod. This is accomplished by registering a preStop hook, which runs and waits for all of the active sessions to be gone. If the hook finishes, or the pod kill grace period expires, kubernetes kills the pod, which will make it fail the health checks and it will be removed from nginx

pleshakov commented 6 years ago

@dcowden thanks for providing more details.

It looks like it is possible to accomplish session draining through the Ingress controller.

Unfortunately, once a pod enters the terminating state, its endpoint is removed from Kubernetes, which makes the Ingress controller remove that endpoint from NGINX configuration. Thus, in order to retain that endpoint in NGINX configuration, when the endpoint is being removed, we must make an additional query to Kubernetes API to check if the corresponding pod is in the terminating state. If it is the case, we need to drain it, instead of removing it. Also, once the pod is successfully removed, we need to make sure that it is also removed from the NGINX configuration.

Do you think the logic above will cover your use case?

I can prepare a PR which implements that logic and share it with you, if you'd like to test it.

dcowden commented 6 years ago

Yes, i think that's the logic... at least as near as I can tell without actually implementing it. I'd be happy to test it. I'm also open to alternate ways of working if it can accomplish the objective with less work.

As a side note, this use case once again validates the decision NOT to use the k8s service abstraction, because its pretty clear that the endpoint would become inaccssible. IIRC, there's a 'use service=true' flag, which would be incompatible with using this functionality.

victor-frag commented 5 years ago

Hello,

i am facing the same scenario as @dcowden, with a java app using tomcat with sticky sessions. Do we have this implemented already?

tkgregory commented 5 years ago

I also have this requirement for Tomcat instances that require session affinity. During deployment existing bound sessions should still be routed through the same instances, with new sessions being routed to the new instances.

Could we get an update on this please?

dcowden commented 5 years ago

@tkgregory @victor-frag we are using https://github.com/jcmoraisjr/haproxy-ingress, which implements this functionality. We've been using it in production for a while-- its been stable, well supported, and actively updated.

irizzant commented 5 years ago

I'd like to understand this as well. We do have a JBoss AS hosting our web application, which has exactly the same problems. We switched to haproxy and we currently handle rolling updates just fine.

As @pleshakov suggested It should be possible though using the same approach taken for haproxy:

Thus, in order to retain that endpoint in NGINX configuration, when the endpoint is being removed, we must make an additional query to Kubernetes API to check if the corresponding pod is in the terminating state. If it is the case, we need to drain it, instead of removing it. Also, once the pod is successfully removed, we need to make sure that it is also removed from the NGINX configuration.

I'd also add that the above should happen only if session affinity is enabled, and there should be no need for an additional query to Kubernetes API since ingress controllers should be automatically notified when pods enter the termination phase.

amodolo commented 5 years ago

@tkgregory @victor-frag we are using https://github.com/jcmoraisjr/haproxy-ingress, which implements this functionality. We've been using it in production for a while-- its been stable, well supported, and actively updated.

@dcowden, i'm figuring out how to use haproxy to mantains alive the application untin the sessions termination. But at the moment, when i deploy a new application's version, the old pods are terminated and the new ones are started, no matter if there are active sessions. Can you explain me how you have solve this? What ingress/haproxy configuration have you used?

Thx

dcowden commented 5 years ago

Hi @amodolo, We set up our tomcat container with a pre stop hook that doesn't return until there are no active sessions left, or until a timeout that's long enough we are comfortable the session isn't a real user.

In our case, we wrote a small servlet we deployed with the app to return the number of sessions left using jmx.. There are other ways for sure, but it works ok for us until we can do better

amodolo commented 5 years ago

I've just implements your solution and seams to works like a charm.

Thx a lot (also for the super fast response 😄)

dcowden commented 5 years ago

@amodolo glad it worked for you! FWIW, we have been using this solution in production for about a year now. We run a 24x7 platform-- but humans do not work 24x7. We simply wait for sessions to die, or 12 hours, whichever comes first. When we execute a build, we'll have extra pods out there serving the old workloads for 1/2 day till they die. It works pretty well.

the main negative ( and why its not THE solution) is that it limits your iteration velocity in production on new code to once a day, which is a bit of a limitation.

amodolo commented 5 years ago

The main negative aspect of this solution is this: suppose you have one server with 2 active sessions and you are rolling up a new application version. The old POD will enter in the drain mode until the sessions die (or the greace period over). Suppose also that the sticky session is based on the cookie generated by HAProxy. In this configuration, if one of the two users logs out from the application, the haproxy's cookie is not removed until the user close the browser (because is a session cookie); so if that user logs out and the logs in again (without close the browser), it will be balanced to the same old POD. One better approach could be to configure the ingress to use the JSESSIONID cookie generated by the server. In this case, if your application removes the session cookie on logout, the user will be immediatelly balanced to one of the new POD after the logout. I hope that i can explain myself. What do you think?

dcowden commented 5 years ago

Yes, we use haproxy in rewrite cookie mode, and use a separate cookie. I think using jsessionid would work too. Another requirement is that when a particular user on an old pod logs out and logs back in, we want to be guaranteed that they switch to a new pod. That ends up being important sometimes

miclefebvre commented 4 years ago

@dcowden @amodolo

We set up our tomcat container with a pre stop hook that doesn't return until there are no active sessions left, or until a timeout that's long enough we are comfortable the session isn't a real user.

In our case, we wrote a small servlet we deployed with the app to return the number of sessions left using jmx.. There are other ways for sure, but it works ok for us until we can do better

Do you mind explaining more how you wrote this pre stop hook ? If I do an exec, I would have to have a script in the same docker image than my tomcat, ortherwise if I do a http call that doesn't return the call will timeout.

How did you do it, you added a script in your tomcat container and it calls the tomcat ? Could it be done with another container in the pod ? Thanks

dcowden commented 4 years ago

Hi @miclefebvre

How did you do it, you added a script in your tomcat container and it calls the tomcat ?

Yes, our script is in the same container as tomcat, and we hook a drain script to a pre-stop. This script calls a URL provided by tomcat that responds with the number of user sessions remaining. When there are no more sessions, or when we have reached our timeout, we finish draining.

Here's the important bit of our drain.sh script:

# notional logic:
# wait for ptplace to terminate as long as:
#   the drain endpoint returns a 2XX within 10 seconds AND
#   the drain response contains the word "DRAINING"
# return 0 if we terminated due to a 2xx resonse that DIDNT include draining
# otherwise, return 1 ( we timed out waiting to drain )

while [ 1 ]
do
    debug_msg "Checking ${DRAIN_URL}, timeout=${DRAIN_URL_TIMEOUT_SECS}s"
    echo "" > $RESULT_FILENAME
    curl -s -f --retry 2  --max-time $DRAIN_URL_TIMEOUT_SECS -o $RESULT_FILENAME --no-buffer "${DRAIN_URL}" 

    STATUS_RESULT=$?
    debug_msg "curl result code: $STATUS_RESULT"

    if [ $STATUS_RESULT -ne 0 ]; then
      info_msg "Drain returned non 2xx. Terminating"      
      exit 1
    fi

    info_msg "Received Result::"
    cat $RESULT_FILENAME

    grep -i -c $STILL_DRAINING $RESULT_FILENAME
    if [ $? -ne  0 ]; then
        info_msg "Draining Complete. Terminating."
        exit 0        
    else
        debug_msg "Still Draining. Waiting $DRAIN_URL_INTERVAL_SECS seconds.."
        sleep $DRAIN_URL_INTERVAL_SECS
    fi    

done

It's worth noting that we terminate if we receive a non 2XX from tomcat-- that's there incase tomcat has become unresponsive while we're draining, which happened once in production. If Tomcat's already hosed, then the user sessions there don't matter. That might seem unlikely, but in our case sessions last a LONG time ( our users are typically active for nearly an entire business day )

Could it be done with another container in the pod ? Thanks

Our pod has only one container, but I suppose it would work with multiple containers, because it's based on making an HTTP call into tomcat.

miclefebvre commented 4 years ago

Thanks a lot @dcowden,

I will give it a try. But we are using jib has base image. I'm not sure if there's curl or anything like that. I'll see if we can do this in another container or if I should change my base image.

deepakkhetwal commented 3 years ago

Hi @dcowden , would you like to share your sticky session configuration for HAproxy ingress?

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

alessandroargentieri commented 3 years ago

hello, anyone knows if this feature has been added in any NGINX ingress controller implementation, like HAProxy does?

brianehlert commented 10 months ago

I think this is valuable to keep around as a general purpose behavior and not have any dependency on sticky-sessions, since how the back-end/upstream pod shuts down is up to the application developer or operator and the ingress controller should simply behave consistently no matter if the upstream takes 2 minutes or 2 hours or 2 days to bleed off.

I believe that current behavior would be to remove the upstream when not in the ready state. Which is different than the drain behavior of NGINX.

To update this with the current state of the API we would need to set an upstream to drain when the pod state is terminating according to EndpointSlices https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/#conditions

This way any pre-stop hooks or other flow can be executed as outlined here: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination

https://nginx.org/en/docs/http/ngx_http_upstream_module.html#server Note: this is not available for stream