rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.51k stars 264 forks source link

rke2-ingress-nginx does not watch Ingress resources without IngressClassName set #6510

Closed hyeluoh closed 3 weeks ago

hyeluoh commented 1 month ago

Environmental Info: RKE2 Version: v1.30.3

rke2 version v1.30.3+rke2r1 go version go1.22.5 X:boringcrypto Node(s) CPU architecture, OS, and Version:

arm64 centos 7.9 Cluster Configuration:

1 server 3 agents Describe the bug:

After upgrading from RKE2 v1.29.7 to v1.30.3, services within the Kubernetes cluster that are accessed through Ingress are returning 404 errors. Steps To Reproduce:

Expected behavior:

Actual behavior:

Additional context / logs:

brandond commented 1 month ago

Can you provide an example showing how exactly you'd configured the ingress settings? What specifically was missing?

nugzarg commented 1 month ago

The reason fro this issue is missing annotation ingressclass.kubernetes.io/is-default-class: "true" for nginx IngressClass. I'm not sure, but it seems that nginx IngressClass was set automatically as default in previous version. Which is not the case now. Simple workaround is t0 set this annotation manually, or change nginx-ingress helm chart configuration. Example:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
      ingressClassResource:
        name: nginx
        enabled: true
        default: true
brandond commented 1 month ago

That should be handled when the chart is upgraded, via the .global.systemDefaultIngressClass chart value that is injected into the chart values. Have you customized the ingress chart deployment in any other way? Can you provide the output of kubectl get helmchart -n kube-system rke2-ingress-nginx -o yaml?

nugzarg commented 1 month ago

Yes, I have customized helm chart od nginx ingress. Here the customized helm chart config:

---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
      allowSnippetAnnotations: true
      enableAnnotationValidations: true
      hostNetwork: false
      hostPort:
        enabled: false
      service:
        enabled: true
        type: NodePort
        nodePorts:
          http: 32080
          https: 32443
          tcp:
            8080: 32808
        externalTrafficPolicy: Local
      dnsPolicy: ClusterFirst
      ingressClassResource:
        name: nginx
        enabled: true
        default: true
      ingressClass: nginx
      metrics:
        enabled: true
        serviceMonitor:
          enabled: true
      config:
        use-forwarded-headers: true
        compute-full-forwarded-for: true
        proxy-body-size: "200m"
        fail_timeout: "5s"
        enable-modsecurity: true
        enable-owasp-modsecurity-crs: true
        modsecurity-snippet: |
          SecAuditLog /dev/stdout
          SecAuditLogFormat JSON
        log-format-escape-json: "true"
        log-format-upstream: '{ 
          "time_local": "$time_local",
          "time_iso8601": "$time_iso8601",
          "network": {
             "forwarded_ip": "$http_x_forwarded_for", 
             "forwarded_original_ip": "$http_x_original_forwarded_for", 
             "real_ip": "$http_x_real_ip"
           },
          "user":{"name":"$remote_user"},
          "user_agent":{"original":"$http_user_agent"},
          "http":{
            "version": "$server_protocol",
            "request":{
              "body":{"bytes":$body_bytes_sent},
              "bytes": $request_length,
              "method":"$request_method",
              "referrer":"$http_referer"
            },
            "response":{
              "body":{"bytes":$body_bytes_sent},
              "bytes": $bytes_sent,
              "status_code":$status,
              "time":$request_time
            },
            "upstream": {
              "bytes": $upstream_response_length,
              "status_code":"$upstream_status",
              "time":$upstream_response_time,
              "address": "$upstream_addr",
              "name": "$proxy_upstream_name"
            }
          },
          "url":{
            "domain":"$host",
            "path":"$uri",
            "query":"$args",
            "original":"$request_uri"
          }
        }'

Section ingressClassResource: was not set before rke2 upgrade.

brandond commented 1 month ago

Please show the helmchart, not the helmchartconfig

nugzarg commented 1 month ago

Here the output of kubectl get helmchart -n kube-system rke2-ingress-nginx -o yaml

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  annotations:
    helm.cattle.io/chart-url: https://rke2-charts.rancher.io/assets/rke2-ingress-nginx/rke2-ingress-nginx-4.8.200.tgz
    helmcharts.cattle.io/managed-by: helm-controller
    objectset.rio.cattle.io/applied: H4sIAAAAAAAA/4yS3W7TQBCFXwXNLY7/1nESS1wUR0AEolJaKvVyvJ7Ei9e70c7G/FR5d7RpqhYKoZfjOXPm8+y5A9ypG3KsrIEKOtJDLNF7TbGyyZhBBL0yLVTwgfRQd+g8RDCQxxY9QnUHaIz16JU1HMo/HGSYmOydDube77hKEtdTPjk2OHZoZEcuSJGZPN93ldk6Yp6YrTLfHz5BBLb5StIz+dgp+2SNCoRn+vabITfZjj1U0Av+7RejVx+Vad9ctK09t+LewuBAUMFzyBdN8g5lGO/3DU34B3sa4BCBxob08Xr/suiQO6hgJkWzmRWbRV7mJc5oKposQzGfLsoNzbNpVhZpIUQbTM+RnmHhHclA0ljr2TvcQbVBzRTB8cVqazwZH/JQ8Kp+O6xe51/qC7H2sp6mt+ZmuH53K9Z9u/5k1p3cLvkyb8X7n8PVPBsS25flIr8e5fRSQQRMPqzaatugjqXesydXr5ZrqCBL4yKP0zhNshKiv2jG4n+q5eerk0TEaZylzwV2QBVif6pjbSXqR1m43RI9LpWDCpIRXaJVk5wye4zlo5jJjUrSE3zxAHY4HH4FAAD//ySeAJRoAwAA
    objectset.rio.cattle.io/id: ""
    objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
    objectset.rio.cattle.io/owner-name: rke2-ingress-nginx
    objectset.rio.cattle.io/owner-namespace: kube-system
  creationTimestamp: "2021-12-22T07:44:12Z"
  finalizers:
  - wrangler.cattle.io/helm-controller
  - wrangler.cattle.io/on-helm-chart-remove
  generation: 64
  labels:
    objectset.rio.cattle.io/hash: 7c3bf74f92626a7e53b11a38596fe8151640433d
  name: rke2-ingress-nginx
  namespace: kube-system
  resourceVersion: "1224480281"
  uid: e854442e-1d62-439e-86fe-ba1ebbfd7711
spec:
  bootstrap: false
  chartContent: 
  set:
    global.clusterCIDR: 10.42.0.0/16
    global.clusterCIDRv4: 10.42.0.0/16
    global.clusterDNS: 10.43.0.10
    global.clusterDomain: cluster.local
    global.rke2DataDir: /var/lib/rancher/rke2
    global.serviceCIDR: 10.43.0.0/16
status:
  jobName: helm-install-rke2-ingress-nginx
brandond commented 1 month ago

Your chart's set values appear to be missing some things that should be injected for ALL system charts, ref: https://github.com/rancher/rke2/blob/e742dc53b463d205773cfb25633309671ce6777c/pkg/bootstrap/bootstrap.go#L312-L323

What is the output of grep -E 'chart-url|global' /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx.yaml? You should see something like this:

root@rke2-server-1:/# grep -E 'chart-url|global' /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx.yaml
    helm.cattle.io/chart-url: https://rke2-charts.rancher.io/assets/rke2-ingress-nginx/rke2-ingress-nginx-4.10.102.tgz
    global.clusterCIDR: 10.42.0.0/16
    global.clusterCIDRv4: 10.42.0.0/16
    global.clusterDNS: 10.43.0.10
    global.clusterDomain: cluster.local
    global.rke2DataDir: /var/lib/rancher/rke2
    global.serviceCIDR: 10.43.0.0/16
    global.systemDefaultIngressClass: ingress-nginx

If you see the global.systemDefaultIngressClass value in the chart on disk, but not in the resource deployed to the cluster, please check for apply errors in your rke2-server log.

If you don't see it there... then something else weird is going on, and we'll want to look at your server's config.yaml.

nugzarg commented 1 month ago

grep -E 'chart-url|global' /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx.yaml Output:

    global.clusterCIDR: 10.42.0.0/16
    global.clusterCIDRv4: 10.42.0.0/16
    global.clusterDNS: 10.43.0.10
    global.clusterDomain: cluster.local
    global.rke2DataDir: /var/lib/rancher/rke2
    global.serviceCIDR: 10.43.0.0/16

cat /etc/rancher/rke2/config.yaml

node-name: "master1.kube.example.com"
node-ip: "1.2.3.4"
node-taint:
- "CriticalAddonsOnly=true:NoExecute"
kubelet-arg:
- feature-gates=SizeMemoryBackedVolumes=true
- seccomp-default=true
- pod-max-pids=2048
cni:
  - cilium
disable:
  - rke2-kube-proxy
brandond commented 1 month ago

And you're sure you're on v1.30.3+rke2rk1 on all your nodes? Can you provide rke2-server logs from journald?

nugzarg commented 1 month ago

No, my cluster is on v1.28.8+rke2r1. I just tried to upgrade first master node to v1.30.3+rke2rk1. The upgrade has triggered nginx-ingress upgrade to v1.10.1-hardened1. After nginx-ingress upgrade, no ingress rule was working. I received error 404 for all requests, because no rule had ingress class set and there was no default ingress class. After that I decided to downgrade the node to v1.28.8+rke2r1 (there was second issue with not working modesecurity and it was too much for me). Downgrade triggered nginx-ingress helm chart downgrade to nginx-1.9.6-hardened1 and everything is working now.

brandond commented 1 month ago

After that I decided to downgrade the node to v1.28.8+rke2r1

This is the first time you've mentioned that you are no longer running the version you listed when creating the issue. It would have been good to mention that, as none of the information I asked for is going to be of any use if you're not running the new version any longer.

because no rule had ingress class set and there was no default ingress class.

rke2-ingress-nginx should have been set as the default ingress class by the chart value I was having you check for.

hyeluoh commented 1 month ago

I have upgraded to version v1.30.3. I checked the Nginx configuration in the Ingress, and there is no domain that I cannot access in the configuration, but it indeed configured the Ingress. My solution is to recreate the Ingress configuration.

hyeluoh commented 1 month ago

No, my cluster is on v1.28.8+rke2r1. I just tried to upgrade first master node to v1.30.3+rke2rk1. The upgrade has triggered nginx-ingress upgrade to v1.10.1-hardened1. After nginx-ingress upgrade, no ingress rule was working. I received error 404 for all requests, because no rule had ingress class set and there was no default ingress class. After that I decided to downgrade the node to v1.28.8+rke2r1 (there was second issue with not working modesecurity and it was too much for me). Downgrade triggered nginx-ingress helm chart downgrade to nginx-1.9.6-hardened1 and everything is working now.

I think I've encountered the same situation as you.I suspect it might be related to the upgrade of the Nginx-Ingress version. The rke2 cluster was upgraded from v1.28.11 to v1.30.3. image

hyeluoh commented 1 month ago

Here is a comparison between the configurations of the two versions.

[root@gt-test-10-117 ~]# diff v1.28.10/nginx.conf v1.30.3/nginx.conf 
2c2
< # Configuration checksum: 8565260235095128852
---
> # Configuration checksum: 6186567879202657564
82,88d81
<       ok, res = pcall(require, "monitor")
<       if not ok then
<       error("require failed: " .. tostring(res))
<       else
<       monitor = res
<       end
<       
111,112d103
<       monitor.init_worker(10000)
<       
144c135
<   server_names_hash_bucket_size   64;
---
>   server_names_hash_bucket_size   32;
237c228
<   # PEM sha: d590dd180ecf6844ce48d03a12a9a92119ff026f
---
>   # PEM sha: 3ee252d5fb4aa5dc6c2eac0505aee1592180d1c0
276a268,269
>       http2 on;
>       
279,280c272,273
<       listen 443 default_server reuseport backlog=511 ssl http2 ;
<       listen [::]:443 default_server reuseport backlog=511 ssl http2 ;
---
>       listen 443 default_server reuseport backlog=511 ssl;
>       listen [::]:443 default_server reuseport backlog=511 ssl;
330,331d322
<               monitor.call()
<               
403a395,396
>           # Custom Response Headers
>           
433,568d425
<   
<   ## start server test.k8s.com
<   server {
<       server_name test.k8s.com ;
<       
<       listen 80  ;
<       listen [::]:80  ;
<       listen 443  ssl http2 ;
<       listen [::]:443  ssl http2 ;
<       
<       set $proxy_upstream_name "-";
<       
<       ssl_certificate_by_lua_block {
<           certificate.call()
<       }
<       
<       location / {
<           
<           set $namespace      "nginx";
<           set $ingress_name   "nginx-web-ingress";
<           set $service_name   "";
<           set $service_port   "";
<           set $location_path  "/";
<           set $global_rate_limit_exceeding n;
<           
<           rewrite_by_lua_block {
<               lua_ingress.rewrite({
<                   force_ssl_redirect = false,
<                   ssl_redirect = true,
<                   force_no_ssl_redirect = false,
<                   preserve_trailing_slash = false,
<                   use_port_in_redirects = false,
<                   global_throttle = { namespace = "", limit = 0, window_size = 0, key = { }, ignored_cidrs = { } },
<               })
<               balancer.rewrite()
<               plugins.run()
<           }
<           
<           # be careful with `access_by_lua_block` and `satisfy any` directives as satisfy any
<           # will always succeed when there's `access_by_lua_block` that does not have any lua code doing `ngx.exit(ngx.DECLINED)`
<           # other authentication method such as basic auth or external auth useless - all requests will be allowed.
<           #access_by_lua_block {
<           #}
<           
<           header_filter_by_lua_block {
<               lua_ingress.header()
<               plugins.run()
<           }
<           
<           body_filter_by_lua_block {
<               plugins.run()
<           }
<           
<           log_by_lua_block {
<               balancer.log()
<               
<               monitor.call()
<               
<               plugins.run()
<           }
<           
<           port_in_redirect off;
<           
<           set $balancer_ewma_score -1;
<           set $proxy_upstream_name "upstream-default-backend";
<           set $proxy_host          $proxy_upstream_name;
<           set $pass_access_scheme  $scheme;
<           
<           set $pass_server_port    $server_port;
<           
<           set $best_http_host      $http_host;
<           set $pass_port           $pass_server_port;
<           
<           set $proxy_alternative_upstream_name "";
<           
<           client_max_body_size                    1m;
<           
<           proxy_set_header Host                   $best_http_host;
<           
<           # Pass the extracted client certificate to the backend
<           
<           # Allow websocket connections
<           proxy_set_header                        Upgrade           $http_upgrade;
<           
<           proxy_set_header                        Connection        $connection_upgrade;
<           
<           proxy_set_header X-Request-ID           $req_id;
<           proxy_set_header X-Real-IP              $remote_addr;
<           
<           proxy_set_header X-Forwarded-For        $remote_addr;
<           
<           proxy_set_header X-Forwarded-Host       $best_http_host;
<           proxy_set_header X-Forwarded-Port       $pass_port;
<           proxy_set_header X-Forwarded-Proto      $pass_access_scheme;
<           proxy_set_header X-Forwarded-Scheme     $pass_access_scheme;
<           
<           proxy_set_header X-Scheme               $pass_access_scheme;
<           
<           # Pass the original X-Forwarded-For
<           proxy_set_header X-Original-Forwarded-For $http_x_forwarded_for;
<           
<           # mitigate HTTPoxy Vulnerability
<           # https://www.nginx.com/blog/mitigating-the-httpoxy-vulnerability-with-nginx/
<           proxy_set_header Proxy                  "";
<           
<           # Custom headers to proxied server
<           
<           proxy_connect_timeout                   5s;
<           proxy_send_timeout                      60s;
<           proxy_read_timeout                      60s;
<           
<           proxy_buffering                         off;
<           proxy_buffer_size                       4k;
<           proxy_buffers                           4 4k;
<           
<           proxy_max_temp_file_size                1024m;
<           
<           proxy_request_buffering                 on;
<           proxy_http_version                      1.1;
<           
<           proxy_cookie_domain                     off;
<           proxy_cookie_path                       off;
<           
<           # In case of errors try the next upstream server before returning an error
<           proxy_next_upstream                     error timeout;
<           proxy_next_upstream_timeout             0;
<           proxy_next_upstream_tries               3;
<           
<           proxy_pass http://upstream_balancer;
<           
<           proxy_redirect                          off;
<           
<       }
<       
<   }
<   ## end server test.k8s.com

Here is the comparison of the Ingress configuration after the upgrade.

[root@gt-test-10-117 ~]# diff v1.28.10/nginx-web-ingress.yaml  v1.30.3/nginx-web-ingress.yaml 
8c8
<   resourceVersion: "20077"
---
>   resourceVersion: "22143"
14,16c14
<   loadBalancer:
<     ingress:
<     - ip: 192.168.10.117
---
>   loadBalancer: {}

@brandond

brandond commented 1 month ago

Are you upgrading directly from v1.28.10 to v1.30.3? That is not supported, you are expected to step through intermediate minors (v1.27) when upgrading.

I am not sure that's related though. Please see the information that was asked for above, regarding the HelmChart resource, both on disk and in the cluster.

serhiynovos commented 3 weeks ago

@brandond after upgrading to RKE2 v1.30.3 +rke2r1 I'm also facing this issue. I checked ingress storage class and it has ingressclass.kubernetes.io/is-default-class: 'true' annotation

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  annotations:
    ingressclass.kubernetes.io/is-default-class: 'true'
    meta.helm.sh/release-name: rke2-ingress-nginx
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: '2024-01-25T22:48:15Z'
  generation: 1
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: rke2-ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: rke2-ingress-nginx
    app.kubernetes.io/part-of: rke2-ingress-nginx
    app.kubernetes.io/version: 1.10.1
    helm.sh/chart: rke2-ingress-nginx-4.10.102
  managedFields:
    - apiVersion: networking.k8s.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:ingressclass.kubernetes.io/is-default-class: {}
            f:meta.helm.sh/release-name: {}
            f:meta.helm.sh/release-namespace: {}
          f:labels:
            .: {}
            f:app.kubernetes.io/component: {}
            f:app.kubernetes.io/instance: {}
            f:app.kubernetes.io/managed-by: {}
            f:app.kubernetes.io/name: {}
            f:app.kubernetes.io/part-of: {}
            f:app.kubernetes.io/version: {}
            f:helm.sh/chart: {}
        f:spec:
          f:controller: {}
      manager: helm
      operation: Update
      time: '2024-08-26T20:10:50Z'
  name: nginx
  resourceVersion: '196680132'
  uid: 93ef06ed-bf17-4c2f-aa1a-0a4619cf1f62
spec:
  controller: k8s.io/ingress-nginx
brandond commented 3 weeks ago

In newer releases of RKE2, the ingress-nginx IngressClass is set as default, and any new Ingress resources created on these versions will have the ingressClassName assigned during creation, if the attribute is not set.

If you're upgrading from earlier releases, and did not explicitly set the ingressClassName on your Ingress resources, the default ingress class WILL NOT be set on your existing resources, and on affected releases of RKE2, ingress-nginx will no longer handle these Ingresses.

The fix is to either:

mdrahman-suse commented 3 weeks ago

Validated the fixes on the latest releases, closing this issue.