ParagPatil96 commented 2 years ago

NGINX Ingress controller version

NGINX Ingress controller Release: v1.1.1 Build: a17181e43ec85534a6fea968d95d019c5a4bc8cf Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.9

Kubernetes version

version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.14-gke.1900", GitCommit:"abc4e63ae76afef74b341d2dba1892471781604f", GitTreeState:"clean", BuildDate:"2021-09-07T09:21:04Z", GoVersion:"go1.15.15b5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: GCP
OS : Container-Optimized OS from Google
Kernel : 5.4.129+

How was the ingress-nginx-controller installed:

We have used helm to install Nginx Ingress Controller following are the values which provided

ingress-nginx:
controller:
priorityClassName: "high-priority"
replicaCount: 3
image:
registry: gcr.io
image: fivetran-webhooks/nginx-ingress-controller
tag: "v1.1.1"
digest: ""
pullPolicy: Always
extraArgs:
shutdown-grace-period: 60
labels:
app.kubernetes.io/part-of: wps
wps-cloud-provider: gcp
wps-location: <us/eu/uk>
podLabels:
app.kubernetes.io/part-of: wps
wps-cloud-provider: gcp
wps-location: <us/eu/uk>
podAnnotations:
fivetran.com/scrape-prometheus: "true"
prometheus.io/port: "10254"
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
fivetran.com/fivetran-app: "true"
minReadySeconds: 30
updateStrategy:
rollingUpdate:
  maxSurge: 1
  maxUnavailable: 0
type: RollingUpdate
resources:
requests:
  cpu: 2
  memory: 2Gi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 9
targetCPUUtilizationPercentage: 75
targetMemoryUtilizationPercentage: 75
service:
enabled: true
loadBalancerIP: null
admissionWebhooks:
enabled: false
config:
# logging config
log-format-upstream: '{"logtype":"request_entry","status": $status, "request_id": "$req_id", "host": "$host", "request_proto": "$server_protocol", "path": "$uri", "request_query": "$args", "request_length": $request_length,  "request_time": $request_time, "method": "$request_method", "time_local": "$time_local", "remote_addr": "$remote_addr", "remote_user": "$remote_user", "http_referer": "$http_referer", "http_user_agent": "$http_user_agent", "body_bytes_sent": "$body_bytes_sent", "bytes_sent": "$bytes_sent", "upstream_addr": "$upstream_addr", "upstream_response_length": "$upstream_response_length", "upstream_response_time": "$upstream_response_time", "upstream_status": "$upstream_status"}'
log-format-escape-json: 'true'

# request contents config
proxy-body-size: 9m
client-body-buffer-size: 9m

# request timeout config
client-body-timeout: '300'
client-header-timeout: '300'
proxy-read-timeout: '300'
proxy-send-timeout: '300'

# upstream pod retry
proxy-next-upstream: 'error timeout http_500 http_502 http_503 http_504 non_idempotent'
proxy-next-upstream-timeout: '60'
proxy-next-upstream-tries: '0'
ssl-redirect: "false"

# https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#load-balance
load-balance: ewma
worker-processes: '3' #As we are using 3 cpu for ingress controller pods

# Recovery Server
location-snippet: |
  proxy_intercept_errors on;
  error_page 500 501 502 503 = @fivetran_recovery;
server-snippet: |
  location @fivetran_recovery {
  proxy_pass http://{Recovery Collector's ClusterIP service's IP address};
  }

Current State of the controller:

kubectl describe ingressclasses

Name:         nginx
Labels:       app.kubernetes.io/component=controller
        app.kubernetes.io/instance=wps-us
        app.kubernetes.io/managed-by=Helm
        app.kubernetes.io/name=ingress-nginx
        app.kubernetes.io/part-of=wps
        app.kubernetes.io/version=1.1.1
        helm.sh/chart=ingress-nginx-4.0.15
        wps-cloud-provider=gcp
        wps-location=us
Annotations:  meta.helm.sh/release-name: wps-us
        meta.helm.sh/release-namespace: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>

Current state of ingress object, if applicable:

kubectl -n <appnamespace> describe ing <ingressname>

Name:             collector-ingress
Labels:           app.kubernetes.io/managed-by=Helm
Namespace:        collector
Address:          xxxxx
Default backend:  default-http-backend:80 (10.18.48.196:8080)
TLS:
webhooks-ssl-certificate terminates 
events-ssl-certificate terminates 
Rules:
Host                   Path  Backends
----                   ----  --------
webhooks.fivetran.com  
                     /webhooks/|/internal/|/snowplow/|/health$|/api_docs|/swagger*   collector-service:80 (10.18.48.39:8001,10.18.52.54:8001,10.18.54.56:8001)
events.fivetran.com    
                     /*   collector-service:80 (10.18.48.39:8001,10.18.52.54:8001,10.18.54.56:8001)
Annotations:             kubernetes.io/ingress.class: nginx
                     meta.helm.sh/release-name: wps-us
                     meta.helm.sh/release-namespace: ingress-nginx
                     nginx.ingress.kubernetes.io/use-regex: true
Events:                  <none>

What happened: Our Ingress controller is serving ~1500 RPS But over time ingress controller memory gets continuously increase but never goes down when it crosses the node limitation ~15GB pods gets evicted.

What you expected to happen: We expect memory to get stablizise at some point.

profiling heap export:
high_mem.txt

For now we are manually restarting worker processes inorder to realise the memory

longwuyuan commented 1 year ago

sounds reasonable from security point of view. So lets wait and see if any data like steps to reproduce at will will become available (besides developer time)

razvan-moj commented 1 year ago

Unfortunately I can't reproduce this with a basic local setup. What I tried: 1. Get a Docker Desktop cluster running, it installs K8s v1.22.5

Deploy an ingress with modsec on and metrics server via terraform


module "modsec_ingress_controllers" {
source = "github.com/ministryofjustice/cloud-platform-terraform-ingress-controller?ref=1.0.14"

replica_count       = "2"
controller_name     = "modsec"
cluster_domain_name = "dummy"
is_live_cluster     = false
live1_cert_dns_name = "dummy"
enable_modsec       = true
enable_owasp        = true
enable_latest_tls   = true

} resource "helm_release" "metrics_server" { name = "metrics-server" chart = "metrics-server" repository = "https://charts.bitnami.com/bitnami" namespace = "kube-system" version = "5.11.0"

set { name = "extraArgs.kubelet-insecure-tls" value = "true" }

set { name = "extraArgs.kubelet-preferred-address-types" value = "InternalIP" } set { name = "hostNetwork" value = "true" } set { name = "apiService.create" value = "true" }

} variable "prometheus_operator_crd_version" { default = "v0.53.1" description = "The version of the prometheus operator crds matching the prometheus chart that is installed in monitoring module" }

data "http" "prometheus_crd_yamls" { for_each = local.prometheus_crd_yamls url = each.value }

resource "kubectl_manifest" "prometheus_operator_crds" { server_side_apply = true for_each = data.http.prometheus_crd_yamls yaml_body = each.value["body"] } terraform { required_providers { helm = { source = "hashicorp/helm" } kubernetes = { source = "hashicorp/kubernetes" } kubectl = { source = "gavinbunney/kubectl" } } required_version = ">= 0.14" }

CRDs are a dependency, no need for a full prometheus. The ingress has a default backend so not need to create any app.
3. Get a snapshot at start:

kubectl -n ingress-controllers top pod

NAME CPU(cores) MEMORY(bytes) nginx-ingress-modsec-controller-57f78445d8-jl2ml 5m 124Mi nginx-ingress-modsec-controller-57f78445d8-kjkkn 7m 128Mi nginx-ingress-modsec-default-backend-64cf5986b9-mk2nr 3m 17Mi

4.  Start hammering it with something like

$ ab -k -c 100 "https://localhost/?exec=/bin/bash"

(chosen a parameter that will trigger a modsec rule and 403)
5. Let it run for about one hour..

NAME CPU(cores) MEMORY(bytes) nginx-ingress-modsec-controller-57f78445d8-jl2ml 1085m 126Mi nginx-ingress-modsec-controller-57f78445d8-kjkkn 808m 128Mi nginx-ingress-modsec-default-backend-64cf5986b9-mk2nr 2m 17Mi


Nothing useful for the test, memory only blips higher.
While I can't reproduce live traffic exactly, I will retry with an empty EKS cluster to get closer to the prod setup.

jfpucheu commented 1 year ago

Hello, On my side I clearly link the issue with the number of ingress using modsecurity on:

Context: image: ingress-nginx/controller:v1.2.1 Number of rules: 1838 number of Config reload: 20/days

args:

/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
--election-id=ingress-controller-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
--enable-ssl-passthrough
--watch-ingress-without-class=true

Regards Jeff

procinger commented 1 year ago

Helm Chart 4.3.0 Nginx Ingress Controller v1.4.0

Good day,

We have been successfully running the Nginx ingress controller in our clusters for some time now. To increase security, we have enabled Modsecurity according to these instructions https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#modsecurity

Shortly after that the ram consumption increased from 2 to 24 Gb ram, until the controllers could not start at all and were in an endless crash loop. We had such similar Grafana metrics as already posted here by other user.

To better understand the problem, we rebuilt Docker images with the same parameters and were able to track down the problem at Modsecurity.

As soon as we started nginx with Valgrind and did some nginx -s reload, Valgrind also issued appropriate reports.

==365== 
==365== 11,472,779 (5,168 direct, 11,467,611 indirect) bytes in 38 blocks are definitely lost in loss record 559 of 559
==365==    at 0x48A2E63: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==365==    by 0x50FB2CB: yy::seclang_parser::parse() (seclang-parser.yy:1032)
==365==    by 0x514BC2B: modsecurity::Parser::Driver::parse(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (driver.cc:145)
==365==    by 0x514BF99: modsecurity::Parser::Driver::parseFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (driver.cc:189)
==365==    by 0x516B410: modsecurity::RulesSet::loadFromUri(char const*) (rules_set.cc:53)
==365==    by 0x516B5C2: msc_rules_add_file (rules_set.cc:296)
==365==    by 0x502CD0A: ngx_conf_set_rules_file (ngx_http_modsecurity_module.c:363)
==365==    by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463)
==365==    by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319)
==365==    by 0x17149D: ngx_http_core_server (ngx_http_core_module.c:2892)
==365==    by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463)
==365==    by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319)
==365==    by 0x16D595: ngx_http_block (ngx_http.c:239)
==365==    by 0x16D595: ngx_http_block (ngx_http.c:122)
==365==    by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463)
==365==    by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319)
==365== 
==365== LEAK SUMMARY:
==365==    definitely lost: 237,390 bytes in 4,103 blocks
==365==    indirectly lost: 14,458,393 bytes in 228,222 blocks
==365==      possibly lost: 1,792,163 bytes in 33,594 blocks
==365==    still reachable: 1,290,291 bytes in 19,520 blocks
==365==         suppressed: 0 bytes in 0 blocks
==365== Reachable blocks (those to which a pointer was found) are not shown.
==365== To see them, rerun with: --leak-check=full --show-leak-kinds=all

With the new information, we took a closer look at ModSecurity and the Nginx Connector and found what we were looking for. After the Modsecurity 3.0.8 release there was a fix commit for a memory leak, see https://github.com/SpiderLabs/ModSecurity/commit/e9a7ba4a60be48f761e0328c6dfcc668d70e35a0

Exactly so in the Nginx Connector project, see https://github.com/SpiderLabs/ModSecurity-nginx/pull/277

So we compiled a new version of the Nginx image with success. The memory leak bug is not finally fixed yet, but it has been reduced from several MB Ram to a few KB Ram. Our Changes https://github.com/procinger/ingress-nginx/commit/b51b754a9d6ce7217b9fabfba2445c26545c4267

With the Valgrind report we know that the memory leak is triggered every time the directive modsecurity_rules_file is processed from the nginx.conf. So we changed the logic to activate Modsecurity, instead of using the ingress annotations we changed the helm configuration to

releases:
  - name: nginx-ingress
    namespace: nginx-ingress
    chart: ingress-nginx/ingress-nginx
    version: 4.3.0
    waitForJobs: true
    values:
      - controller:
          image:
            registry: <custom-repository>
            image: <self-compiled-image>
            tag: "<some-tag>"
            digest: <sha256-of-self-compiled-image>
          ...snip...
          config:
            main-snippet: |
              load_module /etc/nginx/modules/ngx_http_modsecurity_module.so;
            http-snippet: |
              modsecurity on;
              modsecurity_rules '
              SecruleEngine On
              SecAuditEngine On
              SecAuditLogRelevantStatus "^(?:5|4(?!04))"
              SecAuditLog /var/log/waf-logs/sec_audit.log
              SecAuditLogParts ABFHZ
              SecRule REMOTE_ADDR "@ipMatch 127.0.0.1" "id:87,phase:1,pass,nolog,ctl:ruleEngine=Off"
              ';
              modsecurity_rules_file /etc/nginx/owasp-modsecurity-crs/nginx-modsecurity.conf;

With all these changes, we have been successfully running a test cluster with 100-120 ingresses and about 35 config reloads per day for about 3 days. The ram consumption remains constant under 1 GB per ingress controller. With the original images and configurations we had several crashes per day.

I think a good start here would be to update the affected Modsecurity projects, as the current compiled version is not usable. ModSecurity Nginx to 1.0.3 ModSecurity to v3/master head

This will definitely not solve the problem, as a small leak is still present, but it will make it tolerable.

The ModSecurity developers also know about the problem, see https://github.com/SpiderLabs/ModSecurity/issues/2822 https://github.com/SpiderLabs/ModSecurity/issues/2817

We have created a small playground here to better understand the problem https://gist.github.com/procinger/6cae5cdd7de197009d8d9306cebbddfa

The self-compiled controller image with the latest libraries is available here https://hub.docker.com/r/procinger/ingress-nginx

Should I test anything or if you have any questions, feel free to ping me.

Volatus commented 1 year ago

Helm Chart 4.3.0 Nginx Ingress Controller v1.4.0

Good day,

We have been successfully running the Nginx ingress controller in our clusters for some time now. To increase security, we have enabled Modsecurity according to these instructions https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#modsecurity

Shortly after that the ram consumption increased from 2 to 24 Gb ram, until the controllers could not start at all and were in an endless crash loop. We had such similar Grafana metrics as already posted here by other user.

To better understand the problem, we rebuilt Docker images with the same parameters and were able to track down the problem at Modsecurity.

As soon as we started nginx with Valgrind and did some nginx -s reload, Valgrind also issued appropriate reports.
==365== 
==365== 11,472,779 (5,168 direct, 11,467,611 indirect) bytes in 38 blocks are definitely lost in loss record 559 of 559
==365==    at 0x48A2E63: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==365==    by 0x50FB2CB: yy::seclang_parser::parse() (seclang-parser.yy:1032)
==365==    by 0x514BC2B: modsecurity::Parser::Driver::parse(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (driver.cc:145)
==365==    by 0x514BF99: modsecurity::Parser::Driver::parseFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (driver.cc:189)
==365==    by 0x516B410: modsecurity::RulesSet::loadFromUri(char const*) (rules_set.cc:53)
==365==    by 0x516B5C2: msc_rules_add_file (rules_set.cc:296)
==365==    by 0x502CD0A: ngx_conf_set_rules_file (ngx_http_modsecurity_module.c:363)
==365==    by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463)
==365==    by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319)
==365==    by 0x17149D: ngx_http_core_server (ngx_http_core_module.c:2892)
==365==    by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463)
==365==    by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319)
==365==    by 0x16D595: ngx_http_block (ngx_http.c:239)
==365==    by 0x16D595: ngx_http_block (ngx_http.c:122)
==365==    by 0x14519B: ngx_conf_handler (ngx_conf_file.c:463)
==365==    by 0x14519B: ngx_conf_parse (ngx_conf_file.c:319)
==365== 
==365== LEAK SUMMARY:
==365==    definitely lost: 237,390 bytes in 4,103 blocks
==365==    indirectly lost: 14,458,393 bytes in 228,222 blocks
==365==      possibly lost: 1,792,163 bytes in 33,594 blocks
==365==    still reachable: 1,290,291 bytes in 19,520 blocks
==365==         suppressed: 0 bytes in 0 blocks
==365== Reachable blocks (those to which a pointer was found) are not shown.
==365== To see them, rerun with: --leak-check=full --show-leak-kinds=all
With the new information, we took a closer look at ModSecurity and the Nginx Connector and found what we were looking for. After the Modsecurity 3.0.8 release there was a fix commit for a memory leak, see SpiderLabs/ModSecurity@e9a7ba4

Exactly so in the Nginx Connector project, see SpiderLabs/ModSecurity-nginx#277

So we compiled a new version of the Nginx image with success. The memory leak bug is not finally fixed yet, but it has been reduced from several MB Ram to a few KB Ram. Our Changes procinger@b51b754

With the Valgrind report we know that the memory leak is triggered every time the directive modsecurity_rules_file is processed from the nginx.conf. So we changed the logic to activate Modsecurity, instead of using the ingress annotations we changed the helm configuration to
releases:
  - name: nginx-ingress
    namespace: nginx-ingress
    chart: ingress-nginx/ingress-nginx
    version: 4.3.0
    waitForJobs: true
    values:
      - controller:
          image:
            registry: <custom-repository>
            image: <self-compiled-image>
            tag: "<some-tag>"
            digest: <sha256-of-self-compiled-image>
          ...snip...
          config:
            main-snippet: |
              load_module /etc/nginx/modules/ngx_http_modsecurity_module.so;
            http-snippet: |
              modsecurity on;
              modsecurity_rules '
              SecruleEngine On
              SecAuditEngine On
              SecAuditLogRelevantStatus "^(?:5|4(?!04))"
              SecAuditLog /var/log/waf-logs/sec_audit.log
              SecAuditLogParts ABFHZ
              SecRule REMOTE_ADDR "@ipMatch 127.0.0.1" "id:87,phase:1,pass,nolog,ctl:ruleEngine=Off"
              ';
              modsecurity_rules_file /etc/nginx/owasp-modsecurity-crs/nginx-modsecurity.conf;
With all these changes, we have been successfully running a test cluster with 100-120 ingresses and about 35 config reloads per day for about 3 days. The ram consumption remains constant under 1 GB per ingress controller. With the original images and configurations we had several crashes per day.

I think a good start here would be to update the affected Modsecurity projects, as the current compiled version is not usable. ModSecurity Nginx to 1.0.3 ModSecurity to v3/master head

This will definitely not solve the problem, as a small leak is still present, but it will make it tolerable.

The ModSecurity developers also know about the problem, see SpiderLabs/ModSecurity#2822 SpiderLabs/ModSecurity#2817

We have created a small playground here to better understand the problem https://gist.github.com/procinger/6cae5cdd7de197009d8d9306cebbddfa

The self-compiled controller image with the latest libraries is available here https://hub.docker.com/r/procinger/ingress-nginx

Should I test anything or if you have any questions, feel free to ping me.

Thanks for the detailed leak report. Since it's a bug that's present in modsecurity/NGINX, we'll have to wait for this to patched in their upstream before Ingress NGINX is free of the issue.

odinsy commented 1 year ago

well, i'm not alone (:

procinger commented 1 year ago

Good day @Volatus,

thank you for your reply. As you mentioned, the projects have to provide the patches for you to use.

Since there are already patches in the upstream - but in the case of Modsecurity not yet tagged, I thought the easiest thing to do would be to simply update the affected projects. We did not assume it needs to be tagged, because of examples like https://github.com/kubernetes/ingress-nginx/blob/main/images/nginx/rootfs/build.sh#L36.

We are now running self-compiled Ingress controllers for a few days and are fully happy with the result. This is how our Grafana metric currently looks like. Screenshot

Without these patches, our metrics look something like @odinsy.

I'm happy to create a PR here if desired.

longwuyuan commented 1 year ago

@procinger would your possible proposals for change (patches) be opt-in or would the patches apply to all users ?

procinger commented 1 year ago

@longwuyuan the changes/patches refer only to the ModSecurity dependencies we have changed in our example https://github.com/procinger/ingress-nginx/commit/b51b754a9d6ce7217b9fabfba2445c26545c4267

We would use the commit hash instead of v3/master. This would affect all users with ModSecurity enabled.

longwuyuan commented 1 year ago

@procinger Its possible that using modesecurity with near-future releases of the ingress-nginx controller will require more work, both in code and documentation. But until those changes are released, the details of your proposal will provide insight to make decisions. I think you can provide details of your proposal here. Ideally a PR would be appropriate but I am suggesting that the basic tenets of your proposal be discussed here to make the process efficient.

Since multiple users reported this, it does seem like an improvement from the graphic that you posted. Just that the devil is in the details so hope my suggestion is reasonable.

strongjz commented 1 year ago

@procinger A PR would be a great place for more conversation. Thank you for all the investigation.

odinsy commented 1 year ago

@procinger in my case memory leak happens when RPS is over 20k RPS (it's not a workload for node with 16vCPU/32GB RAM) I've found two things

/nginx-ingress-controller process leaks
the reason is that metrics can't be exposed in time (next one request from prometheus came before previous one was processed, and established connections count to prometheus-nginx.socket are growing (just checked ss -m / netstat -anp) I've tried to reduce established timeout, but it had no effect.

I found similar issue, but it was an old version of controller https://github.com/kubernetes/ingress-nginx/issues/7438

Tried three last versions of controller 1.5.1, 1.4.0, 1.3.1 and had this problem on each. And I don't use modesecurity or other security features.

bo-tu commented 1 year ago

The information posted in this issue is not extremely productive or helpful, in the context of the problem description. For example, one way to produce this problem is in a for loop. If you send json payload for whatever objective in a for loop, without even a second of sleep, then CPU/Memory usage to handle that kind of train of events is expected.

The developers are already aware of one scenario where the volume of change is unusually high, like thousands of ingress objects. This leads to performance issues during an event of reloading the nginx.conf .

So in this issue, it helps to track that some users have a performance problem but the progress is going to be really slow because the information precision is lacking. The description of the problem to be solved and some sort of a reasonable reproduce procedure is a much needed aspect here.

Hello @longwuyuan ,

We're experiencing the same memory issue. I saw the same oom-kill message when running dmesg in the nginx-ingress-controller pod.

There are 1453 ingress objects in our kubernetes clusters(v 1.22), and we are using ingress-nginx(chart v4.1.0, app version v1.2.0)

You mentioned The developers are already aware of one scenario where the volume of change is unusually high, like thousands of ingress objects. This leads to performance issues during an event of reloading the nginx.conf, just wondering if there is any existing github issue to track it? Is there any workaround for this problem before having a fix? Thanks a lot

Best regards, Bo

longwuyuan commented 1 year ago

There are issues. I have to search for the numbers and find out if they are open or closed.

One one side, there is hope that nginx upstream is working on the reload problem. In this project, there is work in progress to split control-plane and data-plane. After that is done, there will an opportunity to approach the reload problem differently.

bo-tu commented 1 year ago

There are issues. I have to search for the numbers and find out if they are open or closed.

Maybe this one - https://github.com/kubernetes/ingress-nginx/issues/7234#issuecomment-922601188? Because the current strategy is to render the whole configuration file, so if there are many ingress resources, the configuration will be very large and frequent changes can cause excessive consumption.

max-rocket-internet commented 1 year ago

We are also seeing some very high memory usage with large spikes when the controllers reload.

We have these Ingress resources in our cluster:

11x with 1 host 81 paths
11x with 1 host 33 paths
About 100 others that are quite simple with 1 host and 1-4 paths

There is almost no traffic as this is not a live environment, less than 30 RPS.

The controller pods consume about 2-3GB of memory normally, which seems high as it is, but when an Ingress resource is updated and a controller reload is triggered, memory usage will go well over 10GB and then we have OOMKills.

We are using the bitnami Helm chart with docker image docker.io/bitnami/nginx-ingress-controller:1.6.0-debian-11-r11.

Are these memory usage figures normal? Or are we seeing the issue described here?

longwuyuan commented 1 year ago

This project does not test the bitnami image or the bitnami chart in the CI so its not practical to discuss that here. Please re-open the issue if you find a problem or a bug data instance in this project's manifests or iamges
It would be interesting to know if you had multiple ingress objects with the same host field value but different path values. That would trigger diff limited to the change you make
Reloading large configs is a upstream nginx related known problem. There is work in progress to split control-plane and data-plane to see if such issues can be addressed

/remove-kind bug /close

k8s-ci-robot commented 1 year ago

@longwuyuan: Those labels are not set on the issue: kind/bug

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/8166#issuecomment-1401754852): >- This project does not test the bitnami image or the bitnami chart in the CI so its not practical to discuss that here. Please re-open the issue if you find a problem or a bug data instance in this project's manifests or iamges >- It would be interesting to know if you had multiple ingress objects with the same host field value but different path values. That would trigger diff limited to the change you make >- Reloading large configs is a upstream nginx related known problem. There is work in progress to split control-plane and data-plane to see if such issues can be addressed > >/remove-kind bug >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

k8s-ci-robot commented 1 year ago

@longwuyuan: Closing this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/8166#issuecomment-1401754852): >- This project does not test the bitnami image or the bitnami chart in the CI so its not practical to discuss that here. Please re-open the issue if you find a problem or a bug data instance in this project's manifests or iamges >- It would be interesting to know if you had multiple ingress objects with the same host field value but different path values. That would trigger diff limited to the change you make >- Reloading large configs is a upstream nginx related known problem. There is work in progress to split control-plane and data-plane to see if such issues can be addressed > >/remove-kind bug >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

max-rocket-internet commented 1 year ago

@longwuyuan

This project does not test the bitnami image or the bitnami chart in the CI so its not practical to discuss that here

100% understandable 💙 I will try to test it with main chart and image.

Reloading large configs is a upstream nginx related known problem

Can you give me a link for this known issue?

It would be interesting to know if you had multiple ingress objects with the same host field value but different path values

There's a few like this, about 10.

longwuyuan commented 1 year ago

/re-open

I will have to search for the issues. But I want to avoid hard quoting. Reason is that the details differ and makes it difficult to have conversations that are practical. The gist though, is that if you have a vanilla nginx server, without kubernetes, and a config file that was relatively large like huge Megs, then reloading that config file, with multiple factors involved (containerization, platforming of some sort, live traffic, established connections, cpu/memory availability for bursts etc etc), you would observe similar delays in reload. And then if you factor in state mainenance of platform like kubernetes, you add another oddity.

At least one issue we have dealt with in the past, was related to a bug outside kubernetes. That was worked on by the developer of the related project. Can't recall if the buggy component was lua or nginx or something else. A search of issues will take effort but yield that info.

Hence all such performance issues make for 2 aspect of diligence. One is to triage until the smallest possible detail is clear on root-cause, like a bug or a resource crunch or something else. On the other hand some work is being done on splitting control-plane and data-plane, that will bring in design changes to the workflow and hence, improve things for sure, at least from instrumentation angle.

longwuyuan commented 1 year ago

/reopen

I was talking about 33 ingress resources and 88 ingresses resources where you have multiple paths in one ingress. That would make for a large data-structure to be re-read even if one path was changed. Saying that event will be different from one small ingress with one small path being-re-read.

k8s-ci-robot commented 1 year ago

@longwuyuan: Reopened this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/8166#issuecomment-1401850953): >/reopen > >I was talking about 33 ingress resources and 88 ingresses resources where you have multiple paths in one ingress. That would make for a large data-structure to be re-read even if one path was changed. Saying that event will be different from one small ingress with one small path being-re-read. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

max-rocket-internet commented 1 year ago

@longwuyuan

Thank you very much for taking the time to reply, it's very much appreciated, I know supporting such a popular project must be a lot of work 🙂

But I want to avoid hard quoting

What is "hard quoting"?

and a config file that was relatively large like huge Megs

Ours is 64MB. Is that huge?

you would observe similar delays in reload

Delays are fine. There is free CPU for the controllers, many of them, but each controller pod is not consuming more than 1 CPU core during reload.

And then if you factor in state mainenance of platform like kubernetes, you add another oddity.

Sure. We are just wanting perhaps some steps for how we can debug it further ourselves.

On the other hand some work is being done on splitting control-plane and data-plane

Do you mean in https://github.com/kubernetes/ingress-nginx/issues/8034 ?

I was talking about 33 ingress resources and 88 ingresses resources where you have multiple paths in one ingress. That would make for a large data-structure to be re-read even if one path was changed

OK well perhaps some basic debugging we can do is just delete half of them and see if behaviour changes.

odinsy commented 1 year ago

i've disabled metrics, but still have memory leaks on RPS > 20k We have only 4 Ingress resources our config

Volatus commented 1 year ago

@odinsy @max-rocket-internet @procinger @bo-tu @moraesjeremias @ramanNarasimhan77 @rmathagiarun @bmv126 @venkatmarella

This issue should largely be resolved in the next release of Ingress NGINX as #9330 made it so that the bundled version of ModSecurity points to a fixed commit which has the memory leak fix but has not been released yet.

The release has been delayed due to a breaking change, but should be rolling out not too late from now. Hope that helps.

odinsy commented 1 year ago

Switched to v1.6.4, looks better

longwuyuan commented 1 year ago

@odinsy thanks for update. Seems an important update.

Volatus commented 1 year ago

v1.6.4 should largely fix this issue. Closing.

/close

k8s-ci-robot commented 1 year ago

@Volatus: Closing this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/8166#issuecomment-1436485063): >`v1.6.4` should largely fix this issue. Closing. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

mKeRix commented 1 year ago

Just to add some more details for those finding this thread in the future: in our case (see what @max-rocket-internet posted above), the memory issues weren't from modsecurity, as we don't have that enabled. In our metrics we observed correlations between the memory spikes and the number of worker processes. Since nginx reloads will leave old processes hanging in graceful shutdown for a little while this is to be expected. The usage spikes got especially bad when multiple reloads were triggered in short succession of each other (e.g. through deploying multiple ingress changes in short intervals).

We ended up changing two settings which ended up making memory use much less spiky, which have been running smoothly for about a month now. They are:

We explicitly configured config.worker-processes as 8, as some of our nodes would spawn an (for our purposes) overkill amount of processes due to their high CPU core counts.
We tuned extraArgs.sync-rate-limit to leave a larger gap between syncs, to give the old processes some time to bleed out. We are currently running 0.008, which should allow 1 reload per 125 seconds. The meaning of this value is explained here. For us, the delay in applying new paths/configs if they are fired off in short succession was much more preferable than having crashing controller pods.

JosefWN commented 1 year ago

We explicitly configured config.worker-processes as 8, as some of our nodes would spawn an (for our purposes) overkill amount of processes due to their high CPU core counts.

worker_processes 96;

kubectl top po -n ingress-nginx
NAME                             CPU(cores)   MEMORY(bytes)   
ingress-nginx-controller-k7nts   11m          1438Mi

In the same boat, tried 8 workers just like you:

worker_processes 8;

kubectl top po -n ingress-nginx                                                                                                                                                          master
NAME                             CPU(cores)   MEMORY(bytes)   
ingress-nginx-controller-k7nts   3m           167Mi

Oh lala! 😏

fiskhest commented 1 year ago

Just to add some more details for those finding this thread in the future: in our case (see what @max-rocket-internet posted above), the memory issues weren't from modsecurity, as we don't have that enabled. In our metrics we observed correlations between the memory spikes and the number of worker processes. Since nginx reloads will leave old processes hanging in graceful shutdown for a little while this is to be expected. The usage spikes got especially bad when multiple reloads were triggered in short succession of each other (e.g. through deploying multiple ingress changes in short intervals).

We ended up changing two settings which ended up making memory use much less spiky, which have been running smoothly for about a month now. They are:
1. We explicitly configured `config.worker-processes` as `8`, as some of our nodes would spawn an (for our purposes) overkill amount of processes due to their high CPU core counts.

2. We tuned `extraArgs.sync-rate-limit` to leave a larger gap between syncs, to give the old processes some time to bleed out. We are currently running `0.008`, which should allow 1 reload per 125 seconds. The meaning of this value is explained [here](https://pkg.go.dev/golang.org/x/time/rate#Limit). For us, the delay in applying new paths/configs if they are fired off in short succession was much more preferable than having crashing controller pods.

We had changes to our environment so the physical nodes were being rolled with more CPU cores, and no explicit configuration of worker_processes was defined, defaulting us to the auto value which would assign based on CPU cores. All of a sudden we see ingress-nginx consuming more than double the memory for the same load. This explanation helped us understand and resolve the issue, thanks! :pray:

cep21 commented 9 months ago

For people that find this in the future: We tried updating sync-rate-limit to reduce syncs and reduce memory usage, but ran into issues where frequently restarting pod deployments would result in 5xx errors during the pod restarts until the sync finishes. It's not just on ingress changes (which aren't frequent for us), but pod restarts as well. We think we narrowed it down to these locations:

The dynamic LUA is maybe also not updating backends frequently enough with this setting?

maximumG commented 7 months ago

For people that find this in the future: We tried updating sync-rate-limit to reduce syncs and reduce memory usage, but ran into issues where frequently restarting pod deployments would result in 5xx errors during the pod restarts until the sync finishes.

Ingress-nginx watches for K8S Endpoints to get the real Pod IP instead of using the K8S ClusterIP as upstream server, and for many reasons.

If you are using only basic ingress-nginx features, I think you can avoid those 5XX errors by telling ingress-nginx to use the K8S ClusterIP as upstream instead of watching for every Endpoints : https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#service-upstream

kubernetes / ingress-nginx

Nginx Ingress Pods Consuming Too Much Memory #8166

NGINX Ingress controller version

kubectl -n ingress-controllers top pod