open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.23k stars 442 forks source link

failed calling webhook "mopentelemetrycollector.kb.io" #652

Closed rbaumgar closed 2 years ago

rbaumgar commented 2 years ago

full error message

Error from server (InternalError): error when creating "abc": Internal error occurred: failed calling webhook "mopentelemetrycollector.kb.io": failed to call webhook: Post "https://opentelemetry-operator-controller-manager-service.openshift-operators.svc:443/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector?timeout=10s": dial tcp 10.131.0.154:9443: connect: connection refused

Reason:

$ oc get pod -l control-plane=controller-manager
NAME                                                         READY   STATUS    RESTARTS      AGE
gitops-operator-controller-manager-54d4756897-7gczv          1/1     Running   0             31h
kogito-operator-controller-manager-7d5fc8f765-kntnr          2/2     Running   4 (28h ago)   28h
opentelemetry-operator-controller-manager-69f7f56598-z8dck   2/2     Running   0             56m
jpkrohling commented 2 years ago

This might be related to #521.

cc @rkukura, @VineethReddy02, @pavolloffay

jpkrohling commented 2 years ago

Or not: perhaps we just need to better qualify the selector?

https://github.com/open-telemetry/opentelemetry-operator/blob/e7bb958d0f8a8956f1e32f84f13f376bfbda3afa/config/webhook/service.yaml#L12-L13

https://github.com/open-telemetry/opentelemetry-operator/blob/e7bb958d0f8a8956f1e32f84f13f376bfbda3afa/config/manager/manager.yaml#L10-L23

rbaumgar commented 2 years ago

May be a good idea to add something like "app.kubernetes.io/name=simplest-collector"... Should be a recommendation by the Operator SDK.

jpkrohling commented 2 years ago

This sounds like a good first issue. Would you like to try it out, @rbaumgar?

rbaumgar commented 2 years ago

@jpkrohling sorry looked at wrong pod. The controller-manager has only the label "pod-template-hash: 69f7f56598".

So I added the last line to the service

selector:
    control-plane: controller-manager
    pod-template-hash: 69f7f56598

Works perfect!

pavolloffay commented 2 years ago

Adding some debug info

k get all -n opentelemetry-operator-system                                                                                                                                                                                                                                                                          130 ↵ ploffay@fedora
NAME                                                             READY   STATUS    RESTARTS   AGE
pod/opentelemetry-operator-controller-manager-79b77945bf-bw5lq   2/2     Running   0          15m

NAME                                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/opentelemetry-operator-controller-manager-metrics-service   ClusterIP   10.111.133.184   <none>        8443/TCP   15m
service/opentelemetry-operator-webhook-service                      ClusterIP   10.97.165.245    <none>        443/TCP    15m

NAME                                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/opentelemetry-operator-controller-manager   1/1     1            1           15m

NAME                                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/opentelemetry-operator-controller-manager-79b77945bf   1         1         1       15m
k describe service/opentelemetry-operator-webhook-service -n opentelemetry-operator-system                                                                                                                                                                                                                                ploffay@fedora
Name:              opentelemetry-operator-webhook-service
Namespace:         opentelemetry-operator-system
Labels:            <none>
Annotations:       <none>
Selector:          control-plane=controller-manager
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.97.165.245
IPs:               10.97.165.245
Port:              <unset>  443/TCP
TargetPort:        9443/TCP
Endpoints:         172.17.0.6:9443
Session Affinity:  None
Events:            <none>
k describe deployment.apps/opentelemetry-operator-controller-manager -n opentelemetry-operator-system                                                                                                                                                                                                               130 ↵ ploffay@fedora
Name:                   opentelemetry-operator-controller-manager
Namespace:              opentelemetry-operator-system
CreationTimestamp:      Wed, 09 Feb 2022 10:00:26 +0100
Labels:                 control-plane=controller-manager
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               control-plane=controller-manager
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           control-plane=controller-manager
  Service Account:  opentelemetry-operator-controller-manager
  Containers:
   kube-rbac-proxy:
    Image:      gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
    Port:       8443/TCP
    Host Port:  0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --v=0
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:        5m
      memory:     64Mi
    Environment:  <none>
    Mounts:       <none>
   manager:
    Image:      docker.io/pavolloffay/opentelemetry-operator:810
    Port:       9443/TCP
    Host Port:  0/TCP
    Args:
      --metrics-addr=127.0.0.1:8080
      --enable-leader-election
    Limits:
      cpu:     200m
      memory:  256Mi
    Requests:
      cpu:        100m
      memory:     64Mi
    Liveness:     http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:    http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
  Volumes:
   cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  opentelemetry-operator-controller-manager-service-cert
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   opentelemetry-operator-controller-manager-79b77945bf (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  15m   deployment-controller  Scaled up replica set opentelemetry-operator-controller-manager-79b77945bf to 1