rancher / webhook

Rancher webhook for Kubernetes
Apache License 2.0
22 stars 63 forks source link

violates PodSecurity on rancher-webhook #239

Closed WMP closed 1 year ago

WMP commented 1 year ago

Hi, I installed K8S 1.25 using RKE2 without the option enabled profile: "cis-1.23", then Rancher using helm, and it seems to have pulled rancher-webhook:

helm3 list
NAME            NAMESPACE       REVISION        UPDATED                                         STATUS          CHART                           APP VERSION
rancher         cattle-system   1               2023-05-10 15:38:17.898990115 +0200 CEST        deployed        rancher-2.7.3                   v2.7.3     
rancher-webhook cattle-system   1               2023-05-10 13:46:08.906962964 +0000 UTC         deployed        rancher-webhook-2.0.3+up0.3.3   0.3.3 

helm3 get values rancher
USER-SUPPLIED VALUES:
global:
  cattle:
    psp:
      enabled: false
hostname: XXXXXX
ingress:
  ingressClassName: internal
  tls:
    source: secret
privateCA: true

helm3 get values rancher-webhook
USER-SUPPLIED VALUES:
capi:
  enabled: true
global:
  cattle:
    systemDefaultRegistry: ""
mcm:
  enabled: true
priorityClassName: rancher-critical

Then I enabled in rke2 options profile: "cis-1.23", and rancher-webhook stopped working:

Warning  FailedCreate  58m    replicaset-controller  Error creating: pods "rancher-webhook-656cd8b9f-h25xh" is forbidden: violates PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "rancher-webhook" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "rancher-webhook" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "rancher-webhook" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "rancher-webhook" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

What am I doing wrong?

WMP commented 1 year ago

Workaround is to add namespace cattle-system to exceptions in PodSecurityConfiguration . But, just adding this namespace does not solve all the problems, because in the log pod/rancher-webhook-656cd8b9f-pfdhm I have:

time="2023-05-19T10:35:01Z" level=info msg="Waiting for server to become available: Get \"https://10.43.0.1:443/version\": dial tcp 10.43.0.1:443: i/o timeout"

Socket 10.43.0.1:443 is in namespace default and it is a service/kubernetes . As I created a pod with bash and curl in NS cattle-system or NS default, I was able to connect to this service without any problem:

utils@utils:~$ curl -kv https://10.43.0.1:443/version
*   Trying 10.43.0.1...
* TCP_NODELAY set
* Connected to 10.43.0.1 (10.43.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Unknown (8):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=kube-apiserver
*  start date: May 10 09:04:14 2023 GMT
*  expire date: May 17 12:54:34 2024 GMT
*  issuer: CN=rke2-server-ca@1683709454
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* Using Stream ID: 1 (easy handle 0x556ee75c6640)
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
> GET /version HTTP/2
> Host: 10.43.0.1
> User-Agent: curl/7.58.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
< HTTP/2 401 
< audit-id: 25344859-ee93-479c-9f48-93e6653784f4
< cache-control: no-cache, private
< content-type: application/json
< content-length: 157
< date: Fri, 19 May 2023 10:35:56 GMT
< 
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
* Connection #0 to host 10.43.0.1 left intact
}utils@utils:~$

Only by adding namespace default to exceptions in PodSecurityConfiguration did the webhook start working:

time="2023-05-19T10:36:05Z" level=info msg="Waiting for server to become available: Get \"https://10.43.0.1:443/version\": dial tcp 10.43.0.1:443: i/o timeout"
time="2023-05-19T10:36:23Z" level=info msg="Active TLS secret cattle-system/cattle-webhook-tls (ver=77535) (count 1): map[listener.cattle.io/cn-rancher-webhook.cattle-system.svc:rancher-webhook.cattle-system.svc listener.cattle.io/fingerprint:SHA1=163301133E4AAF661C552D4B9040F66397CE3938]"
time="2023-05-19T10:36:23Z" level=info msg="Listening on :9443"
time="2023-05-19T10:36:23Z" level=info msg="Starting provisioning.cattle.io/v1, Kind=Cluster controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting management.cattle.io/v3, Kind=ProjectRoleTemplateBinding controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting apiextensions.k8s.io/v1, Kind=CustomResourceDefinition controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=RoleBinding controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting management.cattle.io/v3, Kind=Cluster controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting management.cattle.io/v3, Kind=GlobalRole controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRole controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting apiregistration.k8s.io/v1, Kind=APIService controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting management.cattle.io/v3, Kind=ClusterRoleTemplateBinding controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=Role controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting management.cattle.io/v3, Kind=PodSecurityAdmissionConfigurationTemplate controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting management.cattle.io/v3, Kind=RoleTemplate controller"
time="2023-05-19T10:36:23Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding controller"
time="2023-05-19T10:36:23Z" level=info msg="Sleeping for 15 seconds then applying webhook config"
time="2023-05-19T10:36:23Z" level=info msg="Updating TLS secret for cattle-system/cattle-webhook-tls (count: 1): map[listener.cattle.io/cn-rancher-webhook.cattle-system.svc:rancher-webhook.cattle-system.svc listener.cattle.io/fingerprint:SHA1=163301133E4AAF661C552D4B9040F66397CE3938]"
KevinJoiner commented 1 year ago

@WMP You are correct, Rancher requires exceptions to run on a PSA-restricted cluster. See Rancher on PSA-restricted Clusters for more info. By default RKE2 does not exempt all Namespace used by Rancher, see RKE2 Hardening Guide

WMP commented 1 year ago

But in list https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/authentication-permissions-and-global-configuration/psa-config-templates#exempting-required-rancher-namespaces i dont see default namespace. What about it?

KevinJoiner commented 1 year ago

Though adding an exception for the default namespace worked for you, I am unsure if it is necessary. Could you try the sample configuration and see if that also works for you. https://ranchermanager.docs.rancher.com/reference-guides/rancher-security/psa-restricted-exemptions