Closed maon-fp closed 2 weeks ago
My cluster.yaml
: 04-cluster-prod.txt
Operator shows no errors or warnings.
@subhamkrai What are the steps to manually disable the admission controller? I can't seem to find it from previous issues.
@subhamkrai What are the steps to manually disable the admission controller? I can't seem to find it from previous issues.
not exactly remember but setting the value true
should work but if that is not working try deleting
validating webhook rook-ceph-webhook
@maon-fp
Thank you for your replies.
not exactly remember but setting the value
true
should work but if that is not working try deletingvalidating webhook rook-ceph-webhook
Are those supposed to be pods? I don't have any of those. I'm currently at v1.13.8: there is no ROOK_DISABLE_ADMISSION_CONTROLLER
anymore. How can I set it to true
now?
https://github.com/rook/rook/blob/release-1.12/deploy/examples/operator.yaml#L509 it was there till 1.12 and in 1.13 we removed it.
validating webhook rook-ceph-webhook
this is not a pod, this is kubernetes resource, try kubectl get validatingwebhook rook-ceph-webhook*
@subhamkrai Thank you for pointing me in the right direction. I can see those resources:
$ kubectl api-resources --verbs=list -n rook-ceph | grep hook
mutatingwebhookconfigurations admissionregistration.k8s.io/v1 false MutatingWebhookConfiguration
validatingwebhookconfigurations admissionregistration.k8s.io/v1 false ValidatingWebhookConfiguration
$ kubectl api-resources --verbs=list -n rook-ceph | grep val
validatingwebhookconfigurations admissionregistration.k8s.io/v1 false ValidatingWebhookConfiguratio
So none of the ones you mentioned, or?
https://github.com/rook/rook/blob/release-1.12/deploy/examples/operator.yaml#L509 it was there till 1.12 and in 1.13 we removed it.
So no chance to set it to true
now?
@maon-fp could you also share svc list in rook-ceoh namespace?
Also could you share the top 10lines of rook operator pods logs
Yes, of course.
List of services:
$ kgs production:rook-ceph
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
csi-rbdplugin-metrics ClusterIP 10.104.212.46 <none> 8080/TCP,8081/TCP 3y104d
rook-ceph-admission-controller ClusterIP 10.99.221.127 <none> 443/TCP 2y2d
rook-ceph-mgr ClusterIP 10.109.30.124 <none> 9283/TCP 3y104d
rook-ceph-mgr-dashboard ClusterIP 10.107.242.106 <none> 8443/TCP 3y104d
rook-ceph-mon-a ClusterIP 10.101.39.245 <none> 6789/TCP,3300/TCP 3y104d
rook-ceph-mon-c ClusterIP 10.110.130.143 <none> 6789/TCP,3300/TCP 3y104d
rook-ceph-mon-d ClusterIP 10.110.86.107 <none> 6789/TCP,3300/TCP 3y104d
First lines of operator log:
$ kl rook-ceph-operator-9f688fcc5-v2q6j | head -n 10 production:rook-ceph
2024/04/23 14:00:19 maxprocs: Leaving GOMAXPROCS=24: CPU quota undefined
2024-04-23 14:00:19.215493 I | rookcmd: starting Rook v1.13.8 with arguments '/usr/local/bin/rook ceph operator'
2024-04-23 14:00:19.215514 I | rookcmd: flag values: --enable-machine-disruption-budget=false, --help=false, --kubeconfig=, --log-level=INFO
2024-04-23 14:00:19.215519 I | cephcmd: starting Rook-Ceph operator
2024-04-23 14:00:19.322061 I | cephcmd: base ceph version inside the rook operator image is "ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)"
2024-04-23 14:00:19.332548 I | op-k8sutil: ROOK_CURRENT_NAMESPACE_ONLY="false" (env var)
2024-04-23 14:00:19.332558 I | operator: watching all namespaces for Ceph CRs
2024-04-23 14:00:19.332604 I | operator: setting up schemes
2024-04-23 14:00:19.335083 I | operator: setting up the controller-runtime manager
2024-04-23 14:00:19.335422 I | ceph-cluster-controller: successfully started
``
logs didn't help much but yeah delete the following resources in rook-ceph namespace(probably)
Certificate rook-admission-controller-cert
Issuer "selfsigned-issuer"
service "rook-ceph-admission-controller"
Also if you could share the -o yaml output of certificate and issue mentioned above to make sure that you are deleting the right resources. But yes we need to clean above three resources.
rook-admission-controller-cert:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
creationTimestamp: "2022-04-23T18:45:33Z"
generation: 1
name: rook-admission-controller-cert
namespace: rook-ceph
resourceVersion: "301286319"
uid: 22aa348f-e223-4f98-870e-aab4ef1f71a9
spec:
dnsNames:
- rook-ceph-admission-controller
- rook-ceph-admission-controller.rook-ceph.svc
- rook-ceph-admission-controller.rook-ceph.svc.cluster.local
issuerRef:
kind: Issuer
name: selfsigned-issuer
secretName: rook-ceph-admission-controller
status:
conditions:
- lastTransitionTime: "2022-04-23T18:45:34Z"
message: Certificate is up to date and has not expired
observedGeneration: 1
reason: Ready
status: "True"
type: Ready
notAfter: "2024-07-11T18:45:34Z"
notBefore: "2024-04-12T18:45:34Z"
renewalTime: "2024-06-11T18:45:34Z"
revision: 13
selfsigned-issuer:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
creationTimestamp: "2022-04-23T18:45:32Z"
generation: 1
name: selfsigned-issuer
namespace: rook-ceph
resourceVersion: "138597982"
uid: 68162730-aade-4670-b830-1cf97005ef5c
spec:
selfSigned: {}
status:
conditions:
- lastTransitionTime: "2022-04-23T18:45:32Z"
observedGeneration: 1
reason: IsReady
status: "True"
type: Ready
rook-ceph-admission-controller:
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2022-04-23T18:45:34Z"
name: rook-ceph-admission-controller
namespace: rook-ceph
resourceVersion: "214711462"
uid: b62cac4d-ce0c-4f3d-aa19-ff2f9d9d553c
spec:
clusterIP: 10.99.221.127
clusterIPs:
- 10.99.221.127
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- port: 443
protocol: TCP
targetPort: 9443
selector:
app: rook-ceph-operator
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
I deleted those resources but still get (a slightly different) error:
Error from server (InternalError): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ceph.rook.io/v1\",\"kind\":\"CephCluster\",\"metadata\":{\"annotations\":{},\"name\":\"rook-ceph\",\"namespace\":\"rook-ceph\"},\"spec\":{\"annotations\":null,\"cephVersion\":{\"allowUnsupported\":false,\"image\":\"quay.io/ceph/ceph:v18.2.2\"},\"cleanupPolicy\":{\"allowUninstallWithVolumes\":false,\"confirmation\":\"\",\"sanitizeDisks\":{\"dataSource\":\"zero\",\"iteration\":1,\"method\":\"quick\"}},\"continueUpgradeAfterChecksEvenIfNotHealthy\":false,\"crashCollector\":{\"disable\":false},\"csi\":{\"cephfs\":null,\"readAffinity\":{\"enabled\":false}},\"dashboard\":{\"enabled\":true,\"ssl\":true},\"dataDirHostPath\":\"/var/lib/rook\",\"disruptionManagement\":{\"managePodBudgets\":true,\"osdMaintenanceTimeout\":30,\"pgHealthCheckTimeout\":0},\"healthCheck\":{\"daemonHealth\":{\"mon\":{\"disabled\":false,\"interval\":\"45s\"},\"osd\":{\"disabled\":false,\"interval\":\"60s\"},\"status\":{\"disabled\":false,\"interval\":\"60s\"}},\"livenessProbe\":{\"mgr\":{\"disabled\":false},\"mon\":{\"disabled\":false},\"osd\":{\"disabled\":false}},\"startupProbe\":{\"mgr\":{\"disabled\":false},\"mon\":{\"disabled\":false},\"osd\":{\"disabled\":false}}},\"labels\":null,\"logCollector\":{\"enabled\":true,\"maxLogSize\":\"500M\",\"periodicity\":\"daily\"},\"mgr\":{\"allowMultiplePerNode\":true,\"count\":2,\"modules\":null},\"mon\":{\"allowMultiplePerNode\":true,\"count\":3},\"monitoring\":{\"enabled\":false,\"metricsDisabled\":false},\"network\":{\"connections\":{\"compression\":{\"enabled\":false},\"encryption\":{\"enabled\":false},\"requireMsgr2\":false}},\"priorityClassNames\":{\"mgr\":\"system-cluster-critical\",\"mon\":\"system-node-critical\",\"osd\":\"system-node-critical\"},\"removeOSDsIfOutAndSafeToRemove\":false,\"resources\":null,\"skipUpgradeChecks\":false,\"storage\":{\"config\":null,\"nodes\":[{\"devices\":[{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme0n1\"},{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme1n1\"},{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme3n1\"}],\"name\":\"storage1.<redacted>\"},{\"devices\":[{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme0n1\"},{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme2n1\"},{\"config\":{\"osdsPerDevice\":\"4\"},\"name\":\"nvme3n1\"}],\"name\":\"storage2.<redacted>\"}],\"onlyApplyOSDPlacement\":false,\"useAllDevices\":false,\"useAllNodes\":false},\"waitTimeoutForHealthyOSDInMinutes\":10}}\n"}},"spec":{"cephVersion":{"image":"quay.io/ceph/ceph:v18.2.2"},"csi":{"cephfs":null,"readAffinity":{"enabled":false}},"mgr":{"modules":null}}}
to:
Resource: "ceph.rook.io/v1, Resource=cephclusters", GroupVersionKind: "ceph.rook.io/v1, Kind=CephCluster"
Name: "rook-ceph", Namespace: "rook-ceph"
for: "04-cluster-prod.yaml": error when patching "04-cluster-prod.yaml": Internal error occurred: failed calling webhook "cephcluster-wh-rook-ceph-admission-controller-rook-ceph.rook.io": failed to call webhook: Post "https://rook-ceph-admission-controller.rook-ceph.svc:443/validate-ceph-rook-io-v1-cephcluster?timeout=5s": service "rook-ceph-admission-controller" not found
I've also listed all resources in the namespace list_rook_ceph.txt and can find some admission controller resources:
$ grep admission list_rook_ceph.txt
secret/rook-ceph-admission-controller kubernetes.io/tls 3 2y3d
secret/rook-ceph-admission-controller-token-s47d8 kubernetes.io/service-account-token 3 3y105d
serviceaccount/rook-ceph-admission-controller 1 3y105d
try deleting the resources mentioned above
As stated before: the resource are already deleted. But now it complains about: service "rook-ceph-admission-controller" not found
instead of a timeout.
kubectl get validatingwebhookconfigurations -A (search this in all namespace once). Also I'm on holiday today so will look on Monday.
Edit: I hope it's not something blocking you
Thank you. Take your free time! I'm not really blocked.
$ kubectl get validatingwebhookconfigurations -A
NAME WEBHOOKS AGE
cert-manager-webhook 1 3y116d
ingress-nginx-admission 1 432d
metallb-webhook-configuration 7 432d
rook-ceph-webhook 5 2y3d
Thank you. Take your free time! I'm not really blocked.
$ kubectl get validatingwebhookconfigurations -A NAME WEBHOOKS AGE cert-manager-webhook 1 3y116d ingress-nginx-admission 1 432d metallb-webhook-configuration 7 432d rook-ceph-webhook 5 2y3d
I see the issue you need to delete the rook-ceph-webhook (I forgot that webhooks are cluster based resouce) also here is the code https://github.com/rook/rook/blob/b32948c314d64f6b48e40f32d5df656b33d921d1/pkg/operator/ceph/webhook-config.go#L258-L282 that delete everything related to webhook in rook
Alright. I'm not into Go but I'll figure it out. Thank you for your help!
Just to be 100% sure. Are you asking to run:
kubectl delete validatingwebhookconfigurations rook-ceph-webhook
? I'm a bit worried as I can see 5 webhooks there.
Just to be 100% sure. Are you asking to run:
kubectl delete validatingwebhookconfigurations rook-ceph-webhook
? I'm a bit worried as I can see 5 webhooks there.
yess, delete rook-ceph-webhook only
It worked. Thanks a lot for the quick and competent answers! :bow:
Good to know it is working now @maon-fp
I've upgraded rook from v1.10.11 to v1.13.8 step by step (v1.10.11 -> v1.11.11 -> v1.12.11 -> v1.13.8). On https://rook.github.io/docs/rook/v1.13/Upgrade/rook-upgrade/ I've read that the admission controller is gone (which was enabled in my setup by
ROOK_DISABLE_ADMISSION_CONTROLLER: "false"
). So I changed this toROOK_DISABLE_ADMISSION_CONTROLLER: "true"
when still running v1.12.11. Upgrade to v1.13.8 went smoothly. Now I want to upgrade to Reef and try to apply thecluster.yaml
. But this gives me:Environment:
Ubuntu 20.04.6 LTS (Focal Fossa)
uname -a
):5.15.0-105-generic
rook version
inside of a Rook Pod):v1.13.8
ceph -v
):ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
kubectl version
):v1.29.2
ceph health
in the Rook Ceph toolbox):HEALTH_OK