Closed prometherion closed 4 years ago
Hi, thanks for evaluating and using haproxy ingress and thanks also for sharing numbers of such a big deployment!
Regarding the config parsing: I'm really impressed with your processing time. We have a somewhat big environment as well: about 4500 ingress, 3000 services, 1500 namespaces but all the processing takes up to 5s -- five seconds! Would you mind to share a few more numbers - or some of them? How many namespaces and services do you have? Cloud or bare metal? Cloud or self-made control plane (apiserver et al)? Mean size of the nodes? How much of them?
Regarding Data plane API: This'd be a good step forward reusing code from HAProxyTech. You have strong arguments and I agree with all of them. However, even if unsupported features like custom configs could be added, my main concern here is the dependency with an external component in the core of the controller. I had some issues in the past which made me rewrite most of the controller and stop new features for about 1 year. That said I can say a config option which points to the data plane API is a very welcome feature, provided that the use is optional and the integration starts decoupled from the core. Perhaps we can say in the future that this is the right step and we are, currently, a bit late and should be made such step a while ago.
Regarding optimizations: v0.8 was the rewrite to a cleaner code base and a much more optimized configuration; v0.9 was the h2/grpc and acme features; v0.10 was the metrics; v0.11 will be a new step forward regarding optimizations. There are still some pending performance improvements in the config parser (controller being fast), in the dynamic update analyzer (controller being smart), and also in the config builder (haproxy being fast) that need to be made. You should see better numbers in this new version. Note also that haproxy has some limitations regarding what you can do via api/socket without running a full reload: the config from k8s still need to be parsed, config files still need to be written to disk, haproxy still need to be reloaded. Data plane API cannot skip this if the diff cannot be applied in a running process.
Would you mind to share a few more numbers - or some of them?
Sure, actually we got:
All of these resources are living in a single Namespace (it's a sharded cluster just serving apps for the customers).
This cluster is made of 300 worker nodes, powered by 6 vCPU and 32GB of RAM. Most of them got ~100 running pods. All nodes are provisioned by OpenStack, any cloud provider involved.
We're running v0.9-beta.1
with the following args:
- --configmap=ingress/haproxy-ingress-1-controller
- --ingress-class=REDACTED
- --tcp-services-configmap=ingress/haproxy-ingress-1-controller-tcp
- --default-backend-service=ingress/haproxy-ingress-1-default-backend
- --default-ssl-certificate=REDACTED
- --rate-limit-update=0.1
- --verify-hostname=false
and the following ConfigMap data:
backend-server-slots-increment: "42"
config-global: |
cpu-map auto:1/1-4 4-7
tune.pattern.cache-size 0
health-check-interval: ""
healthz-port: "10253"
max-connections: "20000"
nbthread: "4"
ssl-engine: rdrand
ssl-options: no-tlsv10 no-tlsv11 no-tls-tickets
stats-port: "1936"
syslog-endpoint: localhost:514
timeout-client-fin: 5s
timeout-server-fin: 5s
use-htx: "true"
No resource limits or requests at all.
my main concern here is the dependency with an external component in the core of the controller
You're right, this should be added to the Cons
list items. Although its validity, it means to rely on an open-source project created and maintained directly from HAProxy itself: we should evaluate the pay-off and getter more visibility about the maturity of the APIs.
That said I can say a config option which points to the data plane API is a very welcome feature, provided that the use is optional and the integration starts decoupled from the core
My main concerts is that could be tricky and require a lot of effort: my statements could be wrong since I'm looking at the source code from a while, but it means we have to declare a sort of Interface to let update the configuration and it means decoupling the actual code, moving it to a new struct
implementing that.
Pretty sure, the first step should go through this direction: defining Interface to easier the strategy selection (TemplateConfiguration
or DataPlaneConfiguration
).
Regarding optimizations (...)
The main issue we're facing is the constant generation of the template although the following scenario.
customer-temporary-domain.namecheap.tld
my.fancy.domain.tld
)The same applies when a Pod (so, an underlying endpoint is going to be updated) is going to be deleted or created.
According to our benchmarks, everything was fine until 5k Ingress resources: after that, reload time started increasing rapidly.
If I can provide you further details, don't hesitate to ping me directly: glad to help and that's my actual duty here in Namecheap :)
Although its validity, it means to rely on an open-source project created and maintained directly from HAProxy itself
This is exactly what I did in the beginning of the project: I relied on the so called "Kubernetes Ingress", a component used to build ingress controllers. Long story short: what was a component is now ingress-nginx, the reusable part is tight coupled with nginx, closed (golang internal package) and I needed to wrote my own - currently missing only listers and certs. I'm aware data plane api is another story: its main purpose is to be a rest API used to drive haproxy and it's exactly this feature haproxy ingress would use. Note however that most of the functionalities need to touch haproxy in a way or the other, and I'm a bit afraid to find unsupported and hard-to-implement features in the data plane api which would stuck the development. Because of that I'd like to see one step at a time in that direction, evolving both strategies until its maturity.
My main concerts is that could be tricky and require a lot of effort
Yep, no doubt about this. You said:
- define Interface to easier the strategy selection (
TemplateConfiguration
orDataPlaneConfiguration
)
and I'd also say:
Regarding next steps: I'll start v0.11 shortly. In fact I'll promote v0.9 to stable and v0.10 to beta within a few days and will start to think about performance improvement shortly, which is the main focus of v0.11. Perhaps you can have a look at the data plane api performance? I'll design and run a proof-of-concept of the dynamic in-memory model and split config in more than one file, these features should save some io and cpu cycles.
If I can provide you further details, don't hesitate to ping me directly
Thank you, I'll probably do it. Thanks also for such a nice use case.
I'd like to see one step at a time in that direction, evolving both strategies until its maturity
LGTM.
Stress haproxy via data plane api using a config as big as yours (...) Perhaps you can have a look at the data plane api performance?
I guess that it's my first task for the upcoming week :) We're lucky because I can benchmark the Ingress Controler directly with production traffic.
Hi, I started the draft of a new functionality which is simple to implement and should help to distribute the controller load. It was already planned to v0.11 and I'm planning to start now due to its simplicity. This doesn't invalidate the data plane api evaluation, dynamic model and other possible future optimizations. How it sounds on your deployment? Or, almost the same question in other words: why don't you use ingress-class?
draft
pros
cons
tl;dr; it sounds like more to a sort of Service Mesh rather than an Ingress Controller and we all know that Service Mesh is a hard topic.
From an Ingress Controller (and guess it's the end-user target), I'm expecting something really easy to set up: just deploy your Deployment/DS resource, getting a LoadBalancer or NodePort to get exposed on the Internet and you're OK. A distributed architecture like this could increase the amount of point of failure although its validity, as increasing the network overhead since we're going to get master
to worker
(YMMV) to service
rather than master
(aka Ingress Controller) to service
.
By the way, what you described is something similar to what we were thinking of at the beginning here at Namecheap, even if it was way more extreme like this and decided to go for the Ingress resources sharding (tl;dr; we got several Ingress Classes and splitting load across them): Kubernetes is also providing really nice Dynamic Admission Control resources like MutatingAdmissionWebhooks that can delegate that kind of business logic up to end-users, so patching the kubernetes.io/ingress.class
annotation on the fly just counting the number of ingresses an Ingress Controller is handling, the less loaded, and so on and so forth.
Probably, we have to follow the KISS principle and try to enhance further the performances of configuration over templating, as well with Data Plane APIs if feasible and doable.
No, the proposal isn't a sort of service mesh because it continue in the same place of the cluster instead of changing input and output routes of every single pod. Besides that I agree on most of your arguments and, although this should improve and save a few proxies on our own deployment, this is also far from a nice design. Back to the drawing board.
Out of curiosity - you said you were thinking about spllt load across several ingress-class, therefore several controller instances. How'd you expose them?
Out of curiosity - you said you were thinking about spllt load across several ingress-class, therefore several controller instances. How'd you expose them?
We got two or more Ingress Controllers along with class (e.g.: foo
, bar
) and when creating an app, we select the less loaded and putting its value in the annotation.
It means app living on foo
will be caught up by DNS records *.ingress-foo.domain.tld
and so on and so forth with other ingress classes. If the customer wishes to switch to a custom domain just need to point a CNAME/ALIAS to that value and you're ready to go! :)
This issue got stale and will be closed in 7 days.
First of all, thanks for the great effort for this Ingress Controller: we're using it in production for some of our platforms! :)
Our scenario is quite extreme since, although sharding, we're using a single Ingress class to serve more than 13k ingresses: we noticed a slow down from a performances point of view due to the huge amount of resources the Ingress Controller has to watch and to render in the configuration file, causing some headaches to customers as well.
Our idea is to take advantage of the HAProxy Data Plane APIs, with the following pro and cons, please, feel free to add more items to the following lists.
Pro
Cons
ingress.kubernetes.io/config-backend
allows.From a Kubernetes point of view, the Pod would consist of multiple containers:
haproxy
simply and just running the HAProxy binary, sharing the socket on anemptyDir
to let it share with the following containerdataplane-api
running the Data Plane API binary serving on loopback (127.0.0.1
), connecting to the HAProxy instance via the socket shared on the saidemptyDir
ingress-controller
running the Ingress Controller codebase that is getting events from the Shared Informer to update HAProxy configuration through the Data Plane API.We could also use a single container but you know, one container for a single process: let's talk!
As a member of Namecheap Cloud Team organization, I would be happy to discuss and implement this feature, I'm available also to provide a simple PoC as scaffolding.