haproxytech / kubernetes-ingress

HAProxy Kubernetes Ingress Controller
https://www.haproxy.com/documentation/kubernetes/
Apache License 2.0
719 stars 202 forks source link

stick table peering support #32

Open thomasjungblut opened 5 years ago

thomasjungblut commented 5 years ago

Does this controller already support stick tables, favourably with peer syncing? It looks like the models are there, but not sure how the peering configuration would look like.

bedis commented 5 years ago

Hi Thomas, Currently peers are not enabled and stick tables are used for rate-limiting.

What would you like to do with your stick table? what's your use case?

thomasjungblut commented 5 years ago

thanks for the fast reply! we have a somewhat stateful backend that we would need to route to a pod by some url path fragment. In vanilla haproxy cfg that looks like this:

    balance roundrobin
    stick-table type string size 100m
    stick on path, word(3,/)

And on top of that we would need to peer several HAProxy instances to ensure fault tolerance. Currently in k8s that's a bit painful, because the peering only works on static IPs.

bedis commented 5 years ago

an other issue you'll have to fix as well is the order of the servers in different HAProxy server. in k8s, there is no guarantee that 2 ingress controllers will get the list of endpoint in the same order (furthermore if they use DNS). So basically, server id would be different and so persistence would be broken between 2 haproxys. That said, in HAProxy 2.0, we can now rely on servername

@oktalz how are the server names computed? my guess is pure random :) If pure random, this means that the only way to persist would be on serverIP and port which is not doable with HAProxy for now.

oktalz commented 5 years ago

@bedis yes, they are pure random.

thomasjungblut commented 5 years ago

yeah we have tons of ordering issues with the DNS provider already and also with consistent hashing in some scenarios.

Our alternative solution would be to fallback to consistent hashing, but then again we need to ensure the ordering somehow (maybe through a statefulset?). It would be enough if we could use the pod id and use the natural string ordering, so at least it would stay consistent across container restarts.

Would be curious if others have similar issues/requirements using HAProxy.

bedis commented 5 years ago

Usually, workloads in kubernetes are supposed to be statefull :)

@oktalz is there a chance to make the server name less random, IE hash based on the backend name + position in the backend? I am asking because stick table will be able to support server name as an identifier soon (if not already available in 2.0, will check this point). So 2 or more HAProxy would be able to generate the same server name and so stick table would work in such situation.

thomasjungblut commented 5 years ago

if we can run haproxy as a statefulset then we would get stable network identifiers for each of them. Does the peering support dns names nowadays?

bedis commented 5 years ago

it won't support DNS name, but server name in the configuration. That said, I can't see anything in the documentation, so I am currently getting this information and come back to you asap.

Can we scale out statefulset? (not very familiar with them, sorry)

thomasjungblut commented 5 years ago

it won't support DNS name, but server name in the configuration.

then we might have a problem :)

Can we scale out statefulset? (not very familiar with them, sorry)

yep it's possible.

bedis commented 5 years ago

I think I misunderstood your question about peering and DNS names. I thought you meant DNS name of the server in the HAProxy backend. Now, I realize you may mean "dns name to resolve others HAProxy peers". Can you confirm? If the later, the answer is still "no" :) That said, I think we could do it with the following way for now:

If you deploy your ingress controller as daemonset for north/south traffic this should be pretty static.

thomasjungblut commented 5 years ago

Now, I realize you may mean "dns name to resolve others HAProxy peers". Can you confirm?

Correct, that's preventing us from running multiple HAProxies with stick tables in k8s so far.

I thought you meant DNS name of the server in the HAProxy backend.

That works with the dns provider and a server template, so not much that would need to be done here.

If you deploy your ingress controller as daemonset for north/south traffic this should be pretty static.

I would propose another solution: we make it a statefulset with static dns names for each pod and the peering should support it. The other k8s native solution would be via labels/annotation selectors and the k8s APIs.

I understand this is a bit stretchy, but would be cool to finally have a k8s native solution working for this case.

Ivaka commented 5 years ago

:+1: for this, it would be great to have stick table & peering available

@thomasjungblut How did you manage to configure haproxy ingress controller with stick tables, as I cannot find anything in the documentation about this?

thomasjungblut commented 5 years ago

@Ivaka we didn't, we have a single haproxy pod running with the normal docker container. So the configuration is pretty much normal haproxy.cfg 👍

If that's of interest, we use Ambassador as an ingress controller. And on top we use haproxy for some stateful pod affinitization.

bedis commented 5 years ago

I would propose another solution: we make it a statefulset with static dns names for each pod and the peering should support it.

I am not a fan of it because it requires persistent volumes which are not necessary from my point of view in the case of an Ingress controller / LB / HAProxy.

The other k8s native solution would be via labels/annotation selectors and the k8s APIs.

I would prefer this option, more dynamic and not imposing any limitations in term of deployment. It's like the ConfigMap I proposed but in a different manner. The idea is that you could provide a "cluster id" as an argument to the ingress controller and this cluster id could be used as a selector to find "peers" (IE HAProxy forwarding the same traffic).

I understand this is a bit stretchy, but would be cool to finally have a k8s native solution working for this case.

Well, the Selectors solution is more elegant than the ConfigMap. With ConfigMap you must find a way to allow each Ingress Controller to register / deregister and this could be painful. As you said, Selectors is more "k8s native".

bedis commented 5 years ago

To come back to peers, since HAProxy 2.0, I got confirmation that stick-tables will use server name automatically. Server IP:port will be available much later, maybe in 2.1 or even later. @oktalz do you think it's possible to find a way to make the server name generation less random and predictive when multiple HAProxy run in the same cluster? If you can fix this behavior, then peering could be enabled.

oktalz commented 5 years ago

server names can be SRV_00001.... or some similar predictive name, but I'm not totally for it right now, however, I'm also open to discuss it further :)

bedis commented 5 years ago

What's you con against it?

On Fri, Jul 12, 2019 at 8:50 AM Zlatko Bratkovic notifications@github.com wrote:

server names can be SRV_00001.... or some similar predictive name, but I'm not totally for it right now, however, I'm also open to discuss it further :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haproxytech/kubernetes-ingress/issues/32?email_source=notifications&email_token=AAFZVXUZERQTPEBJHLHMFIDP7ASRTA5CNFSM4H7CQULKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZY3QYQ#issuecomment-510769250, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFZVXSGHD4DFJ5A3SEY2YDP7ASRTANCNFSM4H7CQULA .

oktalz commented 5 years ago

current SRV_v8KyB does not imply anything, its not first, oldest, its not linked to any pod, so i can without any problem disable it and reuse for another pod, or even delete it completely on scaling. If i have SRV_0001 at one point someone will ask why it has 001 and its the youngest pod (in theory).

I'm not saying its not possible, but that would require some coding and logic for all that, but with enough requests/wishes, that feature will move up on roadmap.

thomasjungblut commented 5 years ago

Sorry to hijack my own issue here, having this server name discussion makes me curious though: Most of the issues we observe are coming from one of the haproxy pods that has a different view/ordering of the backend servers and that makes the forwarding inconsistent (eg with consistent hashing via dns provider). Do you guys have a solution for these kind of problems?

tezet commented 3 years ago

Is it supported now? If not, is there any other way of achieving HAproxy ingress controller redundancy when using stick tables for routing?

YanaKay23 commented 2 years ago

Is there any implementation of a way to ensure that the stick table data propagate among all HAProxy pods?

dbarvitsky commented 1 year ago

Very much interested in the peering support. Sharing our journey in FYI capacity in case someone finds it helpful or inspirational.

Our goal was to cluster a third-party service that we run in our Kubernetes cluster. The service in question is stateful and keeps a lot of data in memory. We wanted to be able to scale horizontally and decided to try HAProxy. Our idea was to use a custom header computed on the caller and route requests based on the consistent hash of that header to actual pods. We hoped that consistent hashing will produce identical sticky tables. We expected that re-scaling will cause a bunch of misrouting intermittently, but assumed it will eventually stabilize.

We have this in production at the moment:

kind: Service
metadata:
 ...
 labels:
 ...
 haproxy-ingress.github.io/config-backend: |
  # Backend works in HTTP mode, e.g. parses HTTP requests.
  mode http
  balance hdr(X-CUSTOM-HEADER-WHATEVER)
  hash-type consistent
  stick on hdr(X-CUSTOM-HEADER-WHATEVER)
  stick-table type string size 30M

Note that we are running it with --sort-backends option.

Initially it looked like it worked - all our bench tests confirmed we are sticking based on the header and the whole contraption tolerates deployments, restarts and scaling the backend pods up and down. We tested it with two haproxy pods under synthetic load.

However, when we moved to the prod, we started seeing a lot of misses continuously. We are talking ~ 10% easily and it does not go down over time. Sticky tables clearly diverge somehow and stay out of sync. So we had to scale haproxy down to a single pod and call it a partial success. Looks like our synthetic load was not random enough, or not long enough, or our deployments were not erratic enough to cause the sticky tables to diverge under testing. We just got (un)-lucky with our experiments.

At this point we don't really have a good way of scaling haproxy itself. It is a single-pod with limited resources, with all performance and reliability risks that come with it. It looks like peering would be the only reasonable option. I will be very grateful for workaround proposals. I may also arrange for a reasonable bounty for solving the problem. Thanks!

ivanmatmati commented 1 year ago

HI @dbarvitsky , I guess you're refering to this product not ours. But you're welcome to try. The product you're currently using is an other project based on our open source project.

dbarvitsky commented 1 year ago

@ivanmatmati you are right, I somehow confused the two. Apologies. I will take a look. Thank you.

YanaKay23 commented 1 year ago

Hi @dbarvitsky, I believe we are using the same architecture in our project. We are deploying our cluster on AWS EKS. We have multiple HAProxy pods deployed that are sticking to our application pods by extracting a custom header from the responses and storing them in stick tables. Since the stick table entries are not shared between all haproxy pods, we needed to find a way to stick the user's connection to the same haproxy pod. We have done that by adding an AWS ALB in front of the haproxy pods. So the ALB load balances the haproxy pods. By enabling client affinity on the ALB, we ensure that the user (browser session) always sticks to the same haproxy pod which has the relevant stick table entries. Now of course, when an haproxy pod is terminated due to automated scaling, some users will lose their session, which is bad... but so far it was the only convenient approach. Hope this helps

jvidalg commented 1 year ago

@YanaKay23 would you please elaborate on what you meant by an ALB in front of the HAproxy pods? Are you creating a Loadbalancer service using annotations to make it an ALB? Thanks

dmitrysheshukov commented 1 year ago

Hi experts! Have you any ideas to ensure backend endpoints consistency between ingress haproxy instances ? Currently it's looks like that each instance has own backend server list without even sorting (by ip/port for example) - this cause 90% misrouting for statefull requests. Moreover, I suspect that even local state table is ignored after haproxy reload due this startup errors like this "[WARNING] ( config: Can't get version of the global server state file '/var/state/haproxy/global'" Do you have any suggestions and workaround while haproxy peers are not supported in ingress ?
Thanks.

ArielLahiany commented 1 year ago

Hello. I would like to achieve the sharing of stick table between a multi-pod Kubernetes deployment. Is there any way to achieve that today? Is there maybe a way to synx the daya with a third-party memory database like Redis? Thank you.

damdo commented 1 year ago

Some years ago, during an hackathon I developed a POC for peer syncing as an HAProxy sidecar in k8s. I'll share it here so maybe it is useful for others. Please consider this is just a proof-of-concept and not high quality, production code. So please just use it as an example:

https://github.com/damdo/k8s-haproxy-peer-sync

matthisholleville commented 9 months ago

Any news ?

benjamin-bergia commented 6 months ago

Hi, also interested by this. I currently use tables for rate-limiting. Without sync between the pods it's kind of hard to get any proper limiting across the daemonset. Ideally I would pass a service or headless service name and the operator would watch for changes to this service and keep all the endpoints as peers in the config. No idea if this is supported by the current APIs though.

m00lecule commented 3 months ago

@oktalz @bedis This would be a great feature. Advanced sticky tables is what distinguishes HAProxy from other api-gateways, which are implementing often only cookie based sticky sessions.

tuxillo commented 1 month ago

Has there been any progress in this? Any recomendations on how to achieve it?