Closed ParagPatil96 closed 1 year ago
sounds reasonable from security point of view. So lets wait and see if any data like steps to reproduce at will
will become available (besides developer time)
Unfortunately I can't reproduce this with a basic local setup. What I tried: 1. Get a Docker Desktop cluster running, it installs K8s v1.22.5
Deploy an ingress with modsec on and metrics server via terraform
module "modsec_ingress_controllers" {
source = "github.com/ministryofjustice/cloud-platform-terraform-ingress-controller?ref=1.0.14"
replica_count = "2"
controller_name = "modsec"
cluster_domain_name = "dummy"
is_live_cluster = false
live1_cert_dns_name = "dummy"
enable_modsec = true
enable_owasp = true
enable_latest_tls = true
} resource "helm_release" "metrics_server" { name = "metrics-server" chart = "metrics-server" repository = "https://charts.bitnami.com/bitnami" namespace = "kube-system" version = "5.11.0"
set { name = "extraArgs.kubelet-insecure-tls" value = "true" }
set { name = "extraArgs.kubelet-preferred-address-types" value = "InternalIP" } set { name = "hostNetwork" value = "true" } set { name = "apiService.create" value = "true" }
} variable "prometheus_operator_crd_version" { default = "v0.53.1" description = "The version of the prometheus operator crds matching the prometheus chart that is installed in monitoring module" }
locals { prometheus_crd_yamls = { alertmanager_configs = "https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/${var.prometheus_operator_crd_version}/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagerconfigs.yaml" alertmanagers = "https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/${var.prometheus_operator_crd_version}/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml" podmonitors = "https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/${var.prometheus_operator_crd_version}/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml" probes = "https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/${var.prometheus_operator_crd_version}/example/prometheus-operator-crd/monitoring.coreos.com_probes.yaml" prometheuses = "https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/${var.prometheus_operator_crd_version}/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml" prometheusrules = "https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/${var.prometheus_operator_crd_version}/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml" servicemonitors = "https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/${var.prometheus_operator_crd_version}/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml" thanosrulers = "https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/${var.prometheus_operator_crd_version}/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml" } }
data "http" "prometheus_crd_yamls" { for_each = local.prometheus_crd_yamls url = each.value }
resource "kubectl_manifest" "prometheus_operator_crds" { server_side_apply = true for_each = data.http.prometheus_crd_yamls yaml_body = each.value["body"] } terraform { required_providers { helm = { source = "hashicorp/helm" } kubernetes = { source = "hashicorp/kubernetes" } kubectl = { source = "gavinbunney/kubectl" } } required_version = ">= 0.14" }
CRDs are a dependency, no need for a full prometheus. The ingress has a default backend so not need to create any app.
3. Get a snapshot at start:
NAME CPU(cores) MEMORY(bytes) nginx-ingress-modsec-controller-57f78445d8-jl2ml 5m 124Mi nginx-ingress-modsec-controller-57f78445d8-kjkkn 7m 128Mi nginx-ingress-modsec-default-backend-64cf5986b9-mk2nr 3m 17Mi
4. Start hammering it with something like
$ ab -k -c 100 "https://localhost/?exec=/bin/bash"
(chosen a parameter that will trigger a modsec rule and 403)
5. Let it run for about one hour..
NAME CPU(cores) MEMORY(bytes) nginx-ingress-modsec-controller-57f78445d8-jl2ml 1085m 126Mi nginx-ingress-modsec-controller-57f78445d8-kjkkn 808m 128Mi nginx-ingress-modsec-default-backend-64cf5986b9-mk2nr 2m 17Mi
Nothing useful for the test, memory only blips higher.
While I can't reproduce live traffic exactly, I will retry with an empty EKS cluster to get closer to the prod setup.
Hello, On my side I clearly link the issue with the number of ingress using modsecurity on:
Context: image: ingress-nginx/controller:v1.2.1 Number of rules: 1838 number of Config reload: 20/days
args:
Regards Jeff
Helm Chart 4.3.0 Nginx Ingress Controller v1.4.0
Good day,
We have been successfully running the Nginx ingress controller in our clusters for some time now. To increase security, we have enabled Modsecurity according to these instructions https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#modsecurity
Shortly after that the ram consumption increased from 2 to 24 Gb ram, until the controllers could not start at all and were in an endless crash loop. We had such similar Grafana metrics as already posted here by other user.
To better understand the problem, we rebuilt Docker images with the same parameters and were able to track down the problem at Modsecurity.
As soon as we started nginx with Valgrind and did some nginx -s reload
, Valgrind also issued appropriate reports.
==365==
==365== 11,472,779 (5,168 direct, 11,467,611 indirect) bytes in 38 blocks are definitely lost in loss record 559 of 559
==365== at 0x48A2E63: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==365== by 0x50FB2CB: yy::seclang_parser::parse() (seclang-parser.yy:1032)
==365== by 0x514BC2B: modsecurity::Parser::Driver::parse(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (driver.cc:145)
==365== by 0x514BF99: modsecurity::Parser::Driver::parseFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (driver.cc:189)
==365== by 0x516B410: modsecurity::RulesSet::loadFromUri(char const*) (rules_set.cc:53)
==365== by 0x516B5C2: msc_rules_add_file (rules_set.cc:296)
==365== by 0x502CD0A: ngx_conf_set_rules_file (ngx_http_modsecurity_module.c:363)
==365== by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463)
==365== by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319)
==365== by 0x17149D: ngx_http_core_server (ngx_http_core_module.c:2892)
==365== by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463)
==365== by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319)
==365== by 0x16D595: ngx_http_block (ngx_http.c:239)
==365== by 0x16D595: ngx_http_block (ngx_http.c:122)
==365== by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463)
==365== by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319)
==365==
==365== LEAK SUMMARY:
==365== definitely lost: 237,390 bytes in 4,103 blocks
==365== indirectly lost: 14,458,393 bytes in 228,222 blocks
==365== possibly lost: 1,792,163 bytes in 33,594 blocks
==365== still reachable: 1,290,291 bytes in 19,520 blocks
==365== suppressed: 0 bytes in 0 blocks
==365== Reachable blocks (those to which a pointer was found) are not shown.
==365== To see them, rerun with: --leak-check=full --show-leak-kinds=all
With the new information, we took a closer look at ModSecurity and the Nginx Connector and found what we were looking for. After the Modsecurity 3.0.8 release there was a fix commit for a memory leak, see https://github.com/SpiderLabs/ModSecurity/commit/e9a7ba4a60be48f761e0328c6dfcc668d70e35a0
Exactly so in the Nginx Connector project, see https://github.com/SpiderLabs/ModSecurity-nginx/pull/277
So we compiled a new version of the Nginx image with success. The memory leak bug is not finally fixed yet, but it has been reduced from several MB Ram to a few KB Ram. Our Changes https://github.com/procinger/ingress-nginx/commit/b51b754a9d6ce7217b9fabfba2445c26545c4267
With the Valgrind report we know that the memory leak is triggered every time the directive modsecurity_rules_file
is processed from the nginx.conf. So we changed the logic to activate Modsecurity, instead of using the ingress annotations we changed the helm configuration to
releases:
- name: nginx-ingress
namespace: nginx-ingress
chart: ingress-nginx/ingress-nginx
version: 4.3.0
waitForJobs: true
values:
- controller:
image:
registry: <custom-repository>
image: <self-compiled-image>
tag: "<some-tag>"
digest: <sha256-of-self-compiled-image>
...snip...
config:
main-snippet: |
load_module /etc/nginx/modules/ngx_http_modsecurity_module.so;
http-snippet: |
modsecurity on;
modsecurity_rules '
SecruleEngine On
SecAuditEngine On
SecAuditLogRelevantStatus "^(?:5|4(?!04))"
SecAuditLog /var/log/waf-logs/sec_audit.log
SecAuditLogParts ABFHZ
SecRule REMOTE_ADDR "@ipMatch 127.0.0.1" "id:87,phase:1,pass,nolog,ctl:ruleEngine=Off"
';
modsecurity_rules_file /etc/nginx/owasp-modsecurity-crs/nginx-modsecurity.conf;
With all these changes, we have been successfully running a test cluster with 100-120 ingresses and about 35 config reloads per day for about 3 days. The ram consumption remains constant under 1 GB per ingress controller. With the original images and configurations we had several crashes per day.
I think a good start here would be to update the affected Modsecurity projects, as the current compiled version is not usable. ModSecurity Nginx to 1.0.3 ModSecurity to v3/master head
This will definitely not solve the problem, as a small leak is still present, but it will make it tolerable.
The ModSecurity developers also know about the problem, see https://github.com/SpiderLabs/ModSecurity/issues/2822 https://github.com/SpiderLabs/ModSecurity/issues/2817
We have created a small playground here to better understand the problem https://gist.github.com/procinger/6cae5cdd7de197009d8d9306cebbddfa
The self-compiled controller image with the latest libraries is available here https://hub.docker.com/r/procinger/ingress-nginx
Should I test anything or if you have any questions, feel free to ping me.
Helm Chart 4.3.0 Nginx Ingress Controller v1.4.0
Good day,
We have been successfully running the Nginx ingress controller in our clusters for some time now. To increase security, we have enabled Modsecurity according to these instructions https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#modsecurity
Shortly after that the ram consumption increased from 2 to 24 Gb ram, until the controllers could not start at all and were in an endless crash loop. We had such similar Grafana metrics as already posted here by other user.
To better understand the problem, we rebuilt Docker images with the same parameters and were able to track down the problem at Modsecurity.
As soon as we started nginx with Valgrind and did some
nginx -s reload
, Valgrind also issued appropriate reports.==365== ==365== 11,472,779 (5,168 direct, 11,467,611 indirect) bytes in 38 blocks are definitely lost in loss record 559 of 559 ==365== at 0x48A2E63: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==365== by 0x50FB2CB: yy::seclang_parser::parse() (seclang-parser.yy:1032) ==365== by 0x514BC2B: modsecurity::Parser::Driver::parse(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (driver.cc:145) ==365== by 0x514BF99: modsecurity::Parser::Driver::parseFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (driver.cc:189) ==365== by 0x516B410: modsecurity::RulesSet::loadFromUri(char const*) (rules_set.cc:53) ==365== by 0x516B5C2: msc_rules_add_file (rules_set.cc:296) ==365== by 0x502CD0A: ngx_conf_set_rules_file (ngx_http_modsecurity_module.c:363) ==365== by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463) ==365== by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319) ==365== by 0x17149D: ngx_http_core_server (ngx_http_core_module.c:2892) ==365== by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463) ==365== by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319) ==365== by 0x16D595: ngx_http_block (ngx_http.c:239) ==365== by 0x16D595: ngx_http_block (ngx_http.c:122) ==365== by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463) ==365== by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319) ==365== ==365== LEAK SUMMARY: ==365== definitely lost: 237,390 bytes in 4,103 blocks ==365== indirectly lost: 14,458,393 bytes in 228,222 blocks ==365== possibly lost: 1,792,163 bytes in 33,594 blocks ==365== still reachable: 1,290,291 bytes in 19,520 blocks ==365== suppressed: 0 bytes in 0 blocks ==365== Reachable blocks (those to which a pointer was found) are not shown. ==365== To see them, rerun with: --leak-check=full --show-leak-kinds=all
With the new information, we took a closer look at ModSecurity and the Nginx Connector and found what we were looking for. After the Modsecurity 3.0.8 release there was a fix commit for a memory leak, see SpiderLabs/ModSecurity@e9a7ba4
Exactly so in the Nginx Connector project, see SpiderLabs/ModSecurity-nginx#277
So we compiled a new version of the Nginx image with success. The memory leak bug is not finally fixed yet, but it has been reduced from several MB Ram to a few KB Ram. Our Changes procinger@b51b754
With the Valgrind report we know that the memory leak is triggered every time the directive
modsecurity_rules_file
is processed from the nginx.conf. So we changed the logic to activate Modsecurity, instead of using the ingress annotations we changed the helm configuration toreleases: - name: nginx-ingress namespace: nginx-ingress chart: ingress-nginx/ingress-nginx version: 4.3.0 waitForJobs: true values: - controller: image: registry: <custom-repository> image: <self-compiled-image> tag: "<some-tag>" digest: <sha256-of-self-compiled-image> ...snip... config: main-snippet: | load_module /etc/nginx/modules/ngx_http_modsecurity_module.so; http-snippet: | modsecurity on; modsecurity_rules ' SecruleEngine On SecAuditEngine On SecAuditLogRelevantStatus "^(?:5|4(?!04))" SecAuditLog /var/log/waf-logs/sec_audit.log SecAuditLogParts ABFHZ SecRule REMOTE_ADDR "@ipMatch 127.0.0.1" "id:87,phase:1,pass,nolog,ctl:ruleEngine=Off" '; modsecurity_rules_file /etc/nginx/owasp-modsecurity-crs/nginx-modsecurity.conf;
With all these changes, we have been successfully running a test cluster with 100-120 ingresses and about 35 config reloads per day for about 3 days. The ram consumption remains constant under 1 GB per ingress controller. With the original images and configurations we had several crashes per day.
I think a good start here would be to update the affected Modsecurity projects, as the current compiled version is not usable. ModSecurity Nginx to 1.0.3 ModSecurity to v3/master head
This will definitely not solve the problem, as a small leak is still present, but it will make it tolerable.
The ModSecurity developers also know about the problem, see SpiderLabs/ModSecurity#2822 SpiderLabs/ModSecurity#2817
We have created a small playground here to better understand the problem https://gist.github.com/procinger/6cae5cdd7de197009d8d9306cebbddfa
The self-compiled controller image with the latest libraries is available here https://hub.docker.com/r/procinger/ingress-nginx
Should I test anything or if you have any questions, feel free to ping me.
Thanks for the detailed leak report. Since it's a bug that's present in modsecurity/NGINX, we'll have to wait for this to patched in their upstream before Ingress NGINX is free of the issue.
well, i'm not alone (:
Good day @Volatus,
thank you for your reply. As you mentioned, the projects have to provide the patches for you to use.
Since there are already patches in the upstream - but in the case of Modsecurity not yet tagged, I thought the easiest thing to do would be to simply update the affected projects. We did not assume it needs to be tagged, because of examples like https://github.com/kubernetes/ingress-nginx/blob/main/images/nginx/rootfs/build.sh#L36.
We are now running self-compiled Ingress controllers for a few days and are fully happy with the result. This is how our Grafana metric currently looks like.
Without these patches, our metrics look something like @odinsy.
I'm happy to create a PR here if desired.
@procinger would your possible proposals for change (patches) be opt-in or would the patches apply to all users ?
@longwuyuan the changes/patches refer only to the ModSecurity dependencies we have changed in our example https://github.com/procinger/ingress-nginx/commit/b51b754a9d6ce7217b9fabfba2445c26545c4267
We would use the commit hash instead of v3/master. This would affect all users with ModSecurity enabled.
@procinger Its possible that using modesecurity with near-future releases of the ingress-nginx controller will require more work, both in code and documentation. But until those changes are released, the details of your proposal will provide insight to make decisions. I think you can provide details of your proposal here. Ideally a PR would be appropriate but I am suggesting that the basic tenets of your proposal be discussed here to make the process efficient.
Since multiple users reported this, it does seem like an improvement from the graphic that you posted. Just that the devil is in the details so hope my suggestion is reasonable.
@procinger A PR would be a great place for more conversation. Thank you for all the investigation.
@procinger in my case memory leak happens when RPS is over 20k RPS (it's not a workload for node with 16vCPU/32GB RAM) I've found two things
I found similar issue, but it was an old version of controller https://github.com/kubernetes/ingress-nginx/issues/7438
Tried three last versions of controller 1.5.1, 1.4.0, 1.3.1 and had this problem on each. And I don't use modesecurity or other security features.
The information posted in this issue is not extremely productive or helpful, in the context of the problem description. For example, one way to produce this problem is in a for loop. If you send json payload for whatever objective in a for loop, without even a second of sleep, then CPU/Memory usage to handle that kind of train of events is expected.
The developers are already aware of one scenario where the volume of change is unusually high, like thousands of ingress objects. This leads to performance issues during an event of reloading the nginx.conf .
So in this issue, it helps to track that some users have a performance problem but the progress is going to be really slow because the information precision is lacking. The description of the problem to be solved and some sort of a reasonable reproduce procedure is a much needed aspect here.
Hello @longwuyuan ,
We're experiencing the same memory issue. I saw the same oom-kill
message when running dmesg
in the nginx-ingress-controller pod.
There are 1453
ingress objects in our kubernetes clusters(v 1.22
), and we are using ingress-nginx(chart v4.1.0
, app version v1.2.0
)
You mentioned The developers are already aware of one scenario where the volume of change is unusually high, like thousands of ingress objects. This leads to performance issues during an event of reloading the nginx.conf
, just wondering if there is any existing github issue to track it? Is there any workaround for this problem before having a fix? Thanks a lot
Best regards, Bo
There are issues. I have to search for the numbers and find out if they are open or closed.
One one side, there is hope that nginx upstream is working on the reload problem. In this project, there is work in progress to split control-plane and data-plane. After that is done, there will an opportunity to approach the reload problem differently.
There are issues. I have to search for the numbers and find out if they are open or closed.
Maybe this one - https://github.com/kubernetes/ingress-nginx/issues/7234#issuecomment-922601188?
Because the current strategy is to render the whole configuration file, so if there are many ingress resources, the configuration will be very large and frequent changes can cause excessive consumption.
We are also seeing some very high memory usage with large spikes when the controllers reload.
We have these Ingress
resources in our cluster:
There is almost no traffic as this is not a live environment, less than 30 RPS.
The controller pods consume about 2-3GB of memory normally, which seems high as it is, but when an Ingress
resource is updated and a controller reload is triggered, memory usage will go well over 10GB and then we have OOMKills.
We are using the bitnami Helm chart with docker image docker.io/bitnami/nginx-ingress-controller:1.6.0-debian-11-r11
.
Are these memory usage figures normal? Or are we seeing the issue described here?
/remove-kind bug /close
@longwuyuan: Those labels are not set on the issue: kind/bug
@longwuyuan: Closing this issue.
@longwuyuan
This project does not test the bitnami image or the bitnami chart in the CI so its not practical to discuss that here
100% understandable 💙 I will try to test it with main chart and image.
Reloading large configs is a upstream nginx related known problem
Can you give me a link for this known issue?
It would be interesting to know if you had multiple ingress objects with the same host field value but different path values
There's a few like this, about 10.
/re-open
I will have to search for the issues. But I want to avoid hard quoting. Reason is that the details differ and makes it difficult to have conversations that are practical. The gist though, is that if you have a vanilla nginx server, without kubernetes, and a config file that was relatively large like huge Megs, then reloading that config file, with multiple factors involved (containerization, platforming of some sort, live traffic, established connections, cpu/memory availability for bursts etc etc), you would observe similar delays in reload. And then if you factor in state mainenance of platform like kubernetes, you add another oddity.
At least one issue we have dealt with in the past, was related to a bug outside kubernetes. That was worked on by the developer of the related project. Can't recall if the buggy component was lua or nginx or something else. A search of issues will take effort but yield that info.
Hence all such performance issues make for 2 aspect of diligence. One is to triage until the smallest possible detail is clear on root-cause, like a bug or a resource crunch or something else. On the other hand some work is being done on splitting control-plane and data-plane, that will bring in design changes to the workflow and hence, improve things for sure, at least from instrumentation angle.
/reopen
I was talking about 33 ingress resources and 88 ingresses resources where you have multiple paths in one ingress. That would make for a large data-structure to be re-read even if one path was changed. Saying that event will be different from one small ingress with one small path being-re-read.
@longwuyuan: Reopened this issue.
@longwuyuan
Thank you very much for taking the time to reply, it's very much appreciated, I know supporting such a popular project must be a lot of work 🙂
But I want to avoid hard quoting
What is "hard quoting"?
and a config file that was relatively large like huge Megs
Ours is 64MB. Is that huge?
you would observe similar delays in reload
Delays are fine. There is free CPU for the controllers, many of them, but each controller pod is not consuming more than 1 CPU core during reload.
And then if you factor in state mainenance of platform like kubernetes, you add another oddity.
Sure. We are just wanting perhaps some steps for how we can debug it further ourselves.
On the other hand some work is being done on splitting control-plane and data-plane
Do you mean in https://github.com/kubernetes/ingress-nginx/issues/8034 ?
I was talking about 33 ingress resources and 88 ingresses resources where you have multiple paths in one ingress. That would make for a large data-structure to be re-read even if one path was changed
OK well perhaps some basic debugging we can do is just delete half of them and see if behaviour changes.
i've disabled metrics, but still have memory leaks on RPS > 20k We have only 4 Ingress resources our config
@odinsy @max-rocket-internet @procinger @bo-tu @moraesjeremias @ramanNarasimhan77 @rmathagiarun @bmv126 @venkatmarella
This issue should largely be resolved in the next release of Ingress NGINX as #9330 made it so that the bundled version of ModSecurity points to a fixed commit which has the memory leak fix but has not been released yet.
The release has been delayed due to a breaking change, but should be rolling out not too late from now. Hope that helps.
Switched to v1.6.4, looks better
@odinsy thanks for update. Seems an important update.
v1.6.4
should largely fix this issue. Closing.
/close
@Volatus: Closing this issue.
Just to add some more details for those finding this thread in the future: in our case (see what @max-rocket-internet posted above), the memory issues weren't from modsecurity, as we don't have that enabled. In our metrics we observed correlations between the memory spikes and the number of worker processes. Since nginx reloads will leave old processes hanging in graceful shutdown for a little while this is to be expected. The usage spikes got especially bad when multiple reloads were triggered in short succession of each other (e.g. through deploying multiple ingress changes in short intervals).
We ended up changing two settings which ended up making memory use much less spiky, which have been running smoothly for about a month now. They are:
config.worker-processes
as 8
, as some of our nodes would spawn an (for our purposes) overkill amount of processes due to their high CPU core counts.extraArgs.sync-rate-limit
to leave a larger gap between syncs, to give the old processes some time to bleed out. We are currently running 0.008
, which should allow 1 reload per 125 seconds. The meaning of this value is explained here. For us, the delay in applying new paths/configs if they are fired off in short succession was much more preferable than having crashing controller pods.We explicitly configured config.worker-processes as 8, as some of our nodes would spawn an (for our purposes) overkill amount of processes due to their high CPU core counts.
worker_processes 96;
kubectl top po -n ingress-nginx
NAME CPU(cores) MEMORY(bytes)
ingress-nginx-controller-k7nts 11m 1438Mi
In the same boat, tried 8 workers just like you:
worker_processes 8;
kubectl top po -n ingress-nginx master
NAME CPU(cores) MEMORY(bytes)
ingress-nginx-controller-k7nts 3m 167Mi
Oh lala! 😏
Just to add some more details for those finding this thread in the future: in our case (see what @max-rocket-internet posted above), the memory issues weren't from modsecurity, as we don't have that enabled. In our metrics we observed correlations between the memory spikes and the number of worker processes. Since nginx reloads will leave old processes hanging in graceful shutdown for a little while this is to be expected. The usage spikes got especially bad when multiple reloads were triggered in short succession of each other (e.g. through deploying multiple ingress changes in short intervals).
We ended up changing two settings which ended up making memory use much less spiky, which have been running smoothly for about a month now. They are:
1. We explicitly configured `config.worker-processes` as `8`, as some of our nodes would spawn an (for our purposes) overkill amount of processes due to their high CPU core counts. 2. We tuned `extraArgs.sync-rate-limit` to leave a larger gap between syncs, to give the old processes some time to bleed out. We are currently running `0.008`, which should allow 1 reload per 125 seconds. The meaning of this value is explained [here](https://pkg.go.dev/golang.org/x/time/rate#Limit). For us, the delay in applying new paths/configs if they are fired off in short succession was much more preferable than having crashing controller pods.
We had changes to our environment so the physical nodes were being rolled with more CPU cores, and no explicit configuration of worker_processes
was defined, defaulting us to the auto
value which would assign based on CPU cores. All of a sudden we see ingress-nginx consuming more than double the memory for the same load. This explanation helped us understand and resolve the issue, thanks! :pray:
For people that find this in the future: We tried updating sync-rate-limit
to reduce syncs and reduce memory usage, but ran into issues where frequently restarting pod deployments would result in 5xx errors during the pod restarts until the sync finishes. It's not just on ingress changes (which aren't frequent for us), but pod restarts as well. We think we narrowed it down to these locations:
The dynamic LUA is maybe also not updating backends frequently enough with this setting?
For people that find this in the future: We tried updating sync-rate-limit to reduce syncs and reduce memory usage, but ran into issues where frequently restarting pod deployments would result in 5xx errors during the pod restarts until the sync finishes.
Ingress-nginx watches for K8S Endpoints to get the real Pod IP instead of using the K8S ClusterIP as upstream server, and for many reasons.
If you are using only basic ingress-nginx features, I think you can avoid those 5XX errors by telling ingress-nginx to use the K8S ClusterIP as upstream instead of watching for every Endpoints : https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#service-upstream
NGINX Ingress controller version
NGINX Ingress controller Release: v1.1.1 Build: a17181e43ec85534a6fea968d95d019c5a4bc8cf Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.9
Kubernetes version
Environment:
Cloud provider or hardware configuration: GCP
OS : Container-Optimized OS from Google
Kernel : 5.4.129+
How was the ingress-nginx-controller installed:
We have used helm to install Nginx Ingress Controller following are the values which provided
Current State of the controller:
kubectl describe ingressclasses
Current state of ingress object, if applicable:
kubectl -n <appnamespace> describe ing <ingressname>
What happened: Our Ingress controller is serving ~1500 RPS But over time ingress controller memory gets continuously increase but never goes down when it crosses the node limitation ~15GB pods gets evicted.
What you expected to happen: We expect memory to get stablizise at some point.
profiling heap export:
high_mem.txt
For now we are manually restarting worker processes inorder to realise the memory