rancher / rancher

Complete container management platform
http://rancher.com
Apache License 2.0
23.33k stars 2.96k forks source link

[BUG] TLS NGINX Load Balancer causes Rancher fail to launch RKE2 Downstream Cluster #46384

Open braveantony opened 2 months ago

braveantony commented 2 months ago

Rancher Server Setup

Information about the Cluster

Describe the bug I am using Cert-manager to issue certificates for Rancher and have installed Nginx Load Balancer with self-sign Certificates in front of HA Rancher. When I launch Custom type RKE2 via Rancher, I encounter the following error : rancher-system-agent. service sends the following error message:

"Rancher System Agent version v0.3.6 (41c07d0) is starting"
"Using directory /var/lib/rancher/agent/work for work"
"Starting remote watch of plans"
"Initial connection to Kubernetes cluster failed with error Get \"https://rancher.example.com/version\": tls: failed to verify certificate: x509: certificate signed by unknown authority, removing CA data and trying again"

And then it gets stuck here.

To Reproduce

# 1. Create CA Secrets
kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem=./ca-additional.pem

# 2. Install Rancher with Helm
helm install rancher rancher-stable/rancher \
  --namespace cattle-system \
  --set hostname=rancher.example.com \
  --set global.cattle.psp.enabled=false \
  --set replicas=3 \
  --set additionalTrustedCAs=true \
  --version 2.8.5

# 3. Nginx Load Balancer Configuration
worker_processes 4;
worker_rlimit_nofile 40000;

events {
    worker_connections 8192;
}

stream {
    upstream rancher_servers_http {
        least_conn;
        server <IP_NODE_1>:80 max_fails=3 fail_timeout=5s;
        server <IP_NODE_2>:80 max_fails=3 fail_timeout=5s;
        server <IP_NODE_3>:80 max_fails=3 fail_timeout=5s;
    }
    server {
        listen 80;
        proxy_pass rancher_servers_http;
    }

}

http {

    upstream rancher_servers_https {
        least_conn;
        server <IP_NODE_1>:443 max_fails=3 fail_timeout=5s;
        server <IP_NODE_2>:443 max_fails=3 fail_timeout=5s;
        server <IP_NODE_3>:443 max_fails=3 fail_timeout=5s;
    }
    server {
        listen 443 ssl;
        ssl_certificate /path/to/tls.crt;
        ssl_certificate_key /path/to/key.key;
        location / {
            proxy_pass https://rancher_servers_https;
            proxy_set_header Host rancher.example.com;
            proxy_ssl_server_name on;
            proxy_ssl_name rancher.example.com;
        }
    }
}

Expected Result Can Launch RKE2 DownStream Cluster

bpedersen2 commented 2 months ago

You need to tick the insecure checkbox when creating the downstream cluster command (a self-signed cert is always considered insecure by curl)