projectcontour / contour

Contour is a Kubernetes ingress controller using Envoy proxy.
https://projectcontour.io
Apache License 2.0
3.7k stars 672 forks source link

Add support for an Application-specified cookie for load balancing policy. #2856

Open prateekjainaa opened 4 years ago

prateekjainaa commented 4 years ago

Hi All,

Is there an equivalent feature to persistent stickiness in contour as it is present in HAproxy? Or of there is someway of achieving it in contour?

Q. What do we mean by persistent stickiness? Ans. Clients/Browser requests routed to same server/pod, even after browser restart. For more on this, you can refer to section:
The difference between persistence and affinity here

Regards, Prateek

stevesloka commented 4 years ago

I think what you're looking for @prateekjainaa is what envoy calls session-affinity.

Have a look at the following docs: https://projectcontour.io/docs/v1.8.0/httpproxy/#session-affinity

prateekjainaa commented 4 years ago

@stevesloka , apologies from my side. I should have stressed more on the "persistence" nature of session affinity. As far as I understand, session affinity works till browser has session with site. If browser is restarted then, session affinity is gone (request can land on new server). HAproxy supports this feature where session affinity survives browser restarts (persistent stickiness).

Let me know, if I missing something here.

jpeach commented 4 years ago

The contour Cookie load balancer strategy uses a session cookie. IIUC, the HAProxy "persistent" session affinity does cookie-balancing but uses an application-specified cookie to do so (which means that the persistence of the balancing is proportional to the persistence of the application cookie value).

That seems like a reasonable feature request.

prateekjainaa commented 4 years ago

@jpeach IIUC, stickiness survives contour or envoy restarts; because it is based upon cookies. Let me know if I am missing something here.

jpeach commented 4 years ago

@jpeach IIUC, stickiness survives contour or envoy restarts; because it is based upon cookies. Let me know if I am missing something here.

Contour is not in the data path so it has no effects. Cookie balancing survives envoy restart but not browser restart (it's a session cookie).

prateekjainaa commented 4 years ago

So the solution to survive browser restart would be to make it support stickiness based on application cookie? That's the feature you talked about?

On Wed, Sep 2, 2020, 10:29 PM James Peach notifications@github.com wrote:

@jpeach https://github.com/jpeach IIUC, stickiness survives contour or envoy restarts; because it is based upon cookies. Let me know if I am missing something here.

Contour is not in the data path so it has no effects. Cookie balancing survives envoy restart but not browser restart (it's a session cookie).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/projectcontour/contour/issues/2856#issuecomment-686030934, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACIRZJRENTGBW3JISAPR5HDSD22K3ANCNFSM4QQLX5IA .

tsaarni commented 4 years ago

I chatted with @prateekjainaa and heard about this issue.

I checked design/session-affinity.md and it seems that cookies set by application are not working well with Envoy (though RING_HASH LB policy does not seem to be used anymore)

Using the session id or key as the ring hash suffered from the bootstrapping problem of the first request being routed to backend A, which sets a session cookie, however that session cookie will not hash to backend A and cause subsequent requests to arrive at the wrong server, possibly repeating this behavior.

It would be possible to set time-to-live for the cookie that Envoy sets. If value is non-zero, the cookie would be persisted by browser accordingly.

Setting TTL was also considered in the design doc (chapter "Cookie design") but with a conclusion that any time-to-live is equally wrong due to fragility of session affinity in general. Therefore this option is not given to user. Could this option be reconsidered?

youngnick commented 4 years ago

Currently, Contour's session persistence works like this: If you enable it, Contour will tell Envoy to generate a browser session cookie and hash backends based on that cookie. We achieve this by explicitly setting a TTL of 0 on the cookie.

This was chosen because there is no way to choose a default value for the TTL that makes sense - it needs to be relatively close to the lifetime of a backend pod, or you will either have unnecessary churn, or sessions will accumulate at the longest-living pod.

This was done to minimise required configuration, and to not push complexity back to our users. The Kubernetes environment is different in the amount of expected change to standalone hosts, and so may have different behavior when you use the same mechanisms you used to.

That said, if we want to add this functionality, we have to allow specification of some fields, and explain how they interact to produce various outcomes.

(The following details were taken from the envoy docs.)

The two key fields are the name and the ttl. The name field is required, and the TTL determines the type of stickiness.

TTL Type of stickiness
Absent Passive. Envoy will only do persistence if there is a value in the specified name field.
0 Browser Session cookie will be generated.
Specified Cookie with given TTL will be generated.

We can use the 'Absent' case for application cookies, Envoy will do no cookie generation, and requests will not be sticky until the application sets the correct cookie.

Both ttl of 0 and some value will work, but will need documentation to explain what they will do.

In order to implement this, we'll need some answers to these questions:

tsaarni commented 3 years ago

I respond with my understanding but @prateekjainaa can fill in.

I need to leave the last bullet for later (i.e. the hardest question: how would the API change look like) but I wanted to add example use case and ask a few questions from you.

We can use the 'Absent' case for application cookies, Envoy will do no cookie generation, and requests will not be sticky until the application sets the correct cookie.

While reading design/session-affinity.md (chapter "Bootstrapping issues") I got an impression that application provided cookies do not work too well: Upstream application would likely assume, that the application instance that creates the "session" cookie (with application-specified TTL), will be the same instance that the session sticks to. Since the upstream service does not know the hashing algorithm that Envoy uses, it cannot possibly create a cookie that would lead to Envoy selecting that particular instance for the next request. Therefore the next request may be likely to be forwarded to another instance, which might then generate yet another cookie causing the problem to be repeated.

I did not find more information from Envoy documentation or elsewhere, but the explanation in design doc sounded right to me, which raises a question: in which scenario one would use Absent/Passive mode in Envoy? Maybe there is something I missed or misunderstood?

What we want to achieve with this feature. What types of stickiness are in scope?

Example use case scenario:

Legacy stateful application is migrated to Kubernetes and then scaled up - each replica of the service is independent stateful instance of the application.

Problem description:

The application has a concept of login which starts a user session of a known period, lets say e.g. 8 hours for a typical session lasting for a workday. The application instance keeps the user session data in memory for that period. User closes the browser when leaving for lunch (or browser closes for some other reason) and the session cookies are lost. When user is back from lunch they expect the session to be still active, but since cookie was lost they get forwarded to another application instance. User needs to start from empty state and the previously active session is left hanging in the previous application instance for the rest of the day.

Wanted behavior:

Configure session stickiness to match with the session length defined by the application.

How do the available Envoy options meet this scope?

Having an option of specifying TTL for Envoy-generated cookie would likely allow the use case but I guess it is yet to be tested and proven.

If it works, do you think this could be added and should the API change proposal then cover also the application generated cookie (Absent/Passive case)?

youngnick commented 3 years ago

Since the upstream service does not know the hashing algorithm that Envoy uses, it cannot possibly create a cookie that would lead to Envoy selecting that particular instance for the next request. Therefore the next request may be likely to be forwarded to another instance, which might then generate yet another cookie causing the problem to be repeated.

That's my understanding as well. As I think I said above, other proxies mutate the cookie a little bit to indicate which backend the request should be routed to once the application has generated it, but Envoy does not support this. The only two options are "Envoy doesn't modify the cookie at all" or "Envoy generates the cookie itself", from what I can see.

I'll be honest, I don't see how this would achieve the outcome of having requests with no cookie go to any backend, and requests with a cookie be sent back to the same server that issued that cookie.

What Envoy does seem to be able to do is un-cookieed request comes in, backend issues cookie, on next request, Envoy picks a backend to send that traffic to, and will send it to that one. This would require the application to propagate which session belongs to which backend itself, before responding to the request, otherwise requests could race around the backend set without ever settling, as you said @tsaarni. This does not seem like a good idea.

Allowing the setting of a TTL makes sense though. I can see that potentially being added, assuming the API change can be done in a way that makes sense.

sunjayBhatia commented 3 years ago

Now that we support request hash based load balancing, this sounds like one could hash on the Cookie header sent by a client (with the value generated by a single application instance) to implement this? would be some quirks if you have multiple cookies in a single header or multiple Cookie headers entirely

apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: example
spec:
  virtualhost:
    fqdn: example.projectcontour.io
  routes:
  - services:
    - name: example-app
      port: 80
    loadBalancerPolicy:
      strategy: RequestHash
      requestHashPolicies:
      - headerHashOptions:
          headerName: Cookie

we could also move to supporting passive or Envoy generated cookie hashing in the RequestHash load balancer strategy and deprecate the existing Cookie load balancer strategy

the same bootstrapping problem still exists it seems even with this, so not sure how helpful this is

sunjayBhatia commented 3 years ago

could do something hacky here to fix the bootstrap issue, which would look like:

this could interoperate with the upcoming work in cookie rewriting as well to add more attributes to the relevant cookie, or we could make these configurable and rewrite in lua ourselves

its not really much different than Envoy generating a cookie tbh it just forces the first request in the passive cookie load balancing flow to be routed to somewhere that can be used again, but unless the app knows about it its basically what we already have

just thought of this w/o trying it out, might be missing something, this is also probably not valid http cookie semantics/usage, since the cookie is coming from thin air, rather than the server telling a client to save a cookie and send it

sunjayBhatia commented 3 years ago

again doesnt really solve the issue since the app isnt generating the cookie, so again not sure how useful any of this is

youngnick commented 3 years ago

I seem to recall that other load balancers allow specifying the backend by prepending it to the cookie text or something? Without something that allows the generating service to tell Envoy which backend to send it to (which will require the backing server to know the generated cluster name), I don't see how this can ever work.

This seems like something we need to ask upstream about, and see if anyone else has solved this with Envoy.

github-actions[bot] commented 8 months ago

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

You can:

Please send feedback to the #contour channel in the Kubernetes Slack

erikschul commented 8 months ago

Not stale. I'm experiencing this issue as well. My use case is a web app. During rollout, multiple versions may be running in parallel, that are using different assets, e.g. logo1.png or logo2.png. If requests are randomly distributed, some will appear to randomly fail. Therefore, a sticky session is required. This is supported by Contour. But during rollout, pods are added/removed, and horizontal scaling could also cause issues. It's therefore imperative that the session is pinned to the specific server (or application version, but that's more complicated to implement). I realize that the problem is that Envoy doesn't support this, but that may mean that I have to use a different ingress provider.

AFAICT it also seems that other Envoy-based solutions haven't solved this problem either: Emissary, Gloo.

It seems that ingress-nginx supports it: https://kubernetes.github.io/ingress-nginx/examples/affinity/cookie/

... the response contains a Set-Cookie header with the settings we have defined [...] it contains a randomly generated key corresponding to the upstream used for that request [...]. If a client sends a cookie that doesn't correspond to an upstream, NGINX selects an upstream and creates a corresponding cookie. If the backend pool grows NGINX will keep sending the requests through the same server of the first request, even if it's overloaded. When the backend server is removed, the requests are re-routed to another upstream server. This does not require the cookie to be updated [...]