Open david-martin opened 1 year ago
Looks like a cert issue alright
kubectl -n open-cluster-management describe po multicluster-observability-operator-77446bdd89-xp4fm| grep -A 6 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 26m default-scheduler Successfully assigned open-cluster-management/multicluster-observability-operator-77446bdd89-xp4fm to ocm-cluster-1-control-plane
Warning FailedMount 24m kubelet Unable to attach or mount volumes: unmounted volumes=[cert], unattached volumes=[kube-api-access-w8kh6 cert]: timed out waiting for the condition
Warning FailedMount 4m1s (x9 over 22m) kubelet Unable to attach or mount volumes: unmounted volumes=[cert], unattached volumes=[cert kube-api-access-w8kh6]: timed out waiting for the condition
Warning FailedMount 3m56s (x19 over 26m) kubelet MountVolume.SetUp failed for volume "cert" : secret "multicluster-observability-operator-webhook-server-cert" not found
And I can see the openshift annotation on the service.
kubectl -n open-cluster-management get svc multicluster-observability-webhook-service -o yaml | grep "serving-cert-secret-name"
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"service.beta.openshift.io/serving-cert-secret-name":"multicluster-observability-operator-webhook-server-cert"},"labels":{"name":"multicluster-observability-operator"},"name":"multicluster-observability-webhook-service","namespace":"open-cluster-management"},"spec":{"ports":[{"port":443,"protocol":"TCP","targetPort":9443}],"selector":{"name":"multicluster-observability-operator"}}}
service.beta.openshift.io/serving-cert-secret-name: multicluster-observability-operator-webhook-server-cert
I wonder what steps are required to get cert-manager to inject the cert in the right place
I've gotten a little further with the help of cert-manager. Here's some commands I used:
Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.0/cert-manager.yaml
Create a Certificate for the webhook service
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: multicluster-observability-operator-issuer
namespace: open-cluster-management
spec:
selfSigned: {}
EOF
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: multicluster-observability-operator-webhook-server-cert
namespace: open-cluster-management
spec:
dnsNames:
- multicluster-observability-webhook-service.open-cluster-management.svc
secretName: multicluster-observability-operator-webhook-server-cert
issuerRef:
name: multicluster-observability-operator-issuer
EOF
Add cert-manager inject annotation to CRD & ValidatingWebhookConfiguration.
kubectl annotate crd multiclusterobservabilities.observability.open-cluster-management.io cert-manager.io/inject-ca-from=open-cluster-management/multicluster-observability-operator-webhook-server-cert
kubectl annotate ValidatingWebhookConfiguration multicluster-observability-operator cert-manager.io/inject-ca-from=open-cluster-management/multicluster-observability-operator-webhook-server-cert
Now I can list and create MultiClusterObservability CRs.
When I do that, creating the example from the repo at operators/multiclusterobservability/config/samples/observability_v1beta2_multiclusterobservability.yaml
, I'm seeing a new set of problems.
kubectl -n open-cluster-management-observability get pod
NAME READY STATUS RESTARTS AGE
minio-59b76b4cd-4r6kc 0/1 Pending 0 4m22s
observability-alertmanager-0 0/3 ContainerCreating 0 2m38s
observability-grafana-855b85957d-7fglt 0/3 ContainerCreating 0 2m40s
observability-grafana-855b85957d-ssh4g 0/3 ContainerCreating 0 2m40s
observability-observatorium-operator-79f8fc5fc8-5xzzq 0/1 ImagePullBackOff 0 2m40s
The minio problem is:
Warning FailedScheduling 86s (x5 over 8m24s) default-scheduler 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
Alertmanager:
Normal Scheduled 7m37s default-scheduler Successfully assigned open-cluster-management-observability/observability-alertmanager-0 to ocm-cluster-1-control-plane
Warning FailedMount 3m19s kubelet Unable to attach or mount volumes: unmounted volumes=[tls-secret], unattached volumes=[kube-api-access-9src9 tls-secret alertmanager-proxy config-volume alertmanager-db]: timed out waiting for the condition
Warning FailedMount 85s (x11 over 7m37s) kubelet MountVolume.SetUp failed for volume "tls-secret" : secret "alertmanager-tls" not found
Warning FailedMount 64s (x2 over 5m34s) kubelet Unable to attach or mount volumes: unmounted volumes=[tls-secret], unattached volumes=[config-volume alertmanager-db kube-api-access-9src9 tls-secret alertmanager-proxy]: timed out waiting for the condition
Grafana 1:
Normal Scheduled 8m6s default-scheduler Successfully assigned open-cluster-management-observability/observability-grafana-855b85957d-7fglt to ocm-cluster-1-control-plane
Warning FailedMount 7m2s (x8 over 8m5s) kubelet MountVolume.SetUp failed for volume "cookie-secret" : secret "rbac-proxy-cookie-secret" not found
Warning FailedMount 7m2s (x8 over 8m5s) kubelet MountVolume.SetUp failed for volume "tls-secret" : secret "grafana-tls" not found
Warning FailedMount 6m3s kubelet Unable to attach or mount volumes: unmounted volumes=[grafana-datasources tls-secret cookie-secret], unattached volumes=[grafana-datasources grafana-config kube-api-access-9m4l8 tls-secret cookie-secret grafana-storage]: timed out waiting for the condition
Warning FailedMount 114s (x11 over 8m5s) kubelet MountVolume.SetUp failed for volume "grafana-datasources" : secret "grafana-datasources" not found
Grafana 2:
Normal Scheduled 8m43s default-scheduler Successfully assigned open-cluster-management-observability/observability-grafana-855b85957d-ssh4g to ocm-cluster-1-control-plane
Warning FailedMount 7m39s (x8 over 8m42s) kubelet MountVolume.SetUp failed for volume "grafana-datasources" : secret "grafana-datasources" not found
Warning FailedMount 7m39s (x8 over 8m42s) kubelet MountVolume.SetUp failed for volume "cookie-secret" : secret "rbac-proxy-cookie-secret" not found
Warning FailedMount 6m40s kubelet Unable to attach or mount volumes: unmounted volumes=[grafana-datasources tls-secret cookie-secret], unattached volumes=[grafana-datasources grafana-config kube-api-access-2jv7b tls-secret cookie-secret grafana-storage]: timed out waiting for the condition
Warning FailedMount 2m31s (x11 over 8m42s) kubelet MountVolume.SetUp failed for volume "tls-secret" : secret "grafana-tls" not found
The ImagePullBackOff is due to a missing image tag in quay:
Normal BackOff 68s (x20 over 6m20s) kubelet Back-off pulling image "quay.io/stolostron/observatorium-operator:2.4.0-SNAPSHOT-2021-09-23-07-02-14"
I see the README is geared at the hub cluster being Openshift, however I also see this wording that suggests may this can work on plain k8s as well?
I'm using local kind clusters with k8s v1.26.0 I've gotten as far as the below command but hitting an error. I'm thinking it could be related to the webhook?
Is there a way I can get this add-on to work with plain k8s?