operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.72k stars 545 forks source link

Package server APIServer unreachable after install on EKS with Istio #2514

Open amilanoski opened 2 years ago

amilanoski commented 2 years ago

Bug Report

What did you do? A clear and concise description of the steps you took (or insert a code snippet).

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.19.1/crds.yaml
  - https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.19.1/olm.yaml

kubectl apply -k .

What did you expect to see? A clear and concise description of what you expected to happen (or insert a code snippet).

OLM Services come up and I am able to install an Operator from OperatorHub.io

What did you see instead? Under which circumstances? A clear and concise description of what you expected to happen (or insert a code snippet).

OLM Services came up according to this.

kg pod,svc -n olm
NAME                                    READY   STATUS    RESTARTS   AGE
pod/catalog-operator-765c45774c-dsvm9   2/2     Running   1          166m
pod/olm-operator-9bb87877b-zmr7f        2/2     Running   1          166m
pod/operatorhubio-catalog-fq7qn         2/2     Running   0          61m
pod/packageserver-568d867fff-bzsjb      2/2     Running   0          13m
pod/packageserver-568d867fff-t7hs8      2/2     Running   0          62m

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service/operatorhubio-catalog   ClusterIP   10.100.177.142   <none>        50051/TCP   166m
service/packageserver-service   ClusterIP   10.100.127.51    <none>        5443/TCP    2m22s

packserver csv stuck in Installing State

kg csv -n olm
NAME            DISPLAY          VERSION   REPLACES   PHASE
packageserver   Package Server   0.19.1               Installing
  Requirement Status:
    Group:    operators.coreos.com
    Kind:     ClusterServiceVersion
    Message:  CSV minKubeVersion (1.11.0) less than server version (v1.20.7-eks-d88609)
    Name:     packageserver
    Status:   Present
    Version:  v1alpha1
    Group:    apiregistration.k8s.io
    Kind:     APIService
    Message:
    Name:     v1.packages.operators.coreos.com
    Status:   DeploymentFound
    Version:  v1
    Dependents:
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["create","get"],"apiGroups":["authorization.k8s.io"],"resources":["subjectaccessreviews"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["get","list","watch"],"apiGroups":[""],"resources":["configmaps"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["get","list","watch"],"apiGroups":["operators.coreos.com"],"resources":["catalogsources"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["get","list"],"apiGroups":["packages.operators.coreos.com"],"resources":["packagemanifests"]}
      Status:   Satisfied
      Version:  v1
    Group:
    Kind:       ServiceAccount
    Message:
    Name:       olm-operator-serviceaccount
    Status:     Present
    Version:    v1
Events:
  Type     Reason               Age                   From                        Message
  ----     ------               ----                  ----                        -------
  Normal   RequirementsUnknown  52m                   operator-lifecycle-manager  requirements not yet checked
  Normal   AllRequirementsMet   42m (x6 over 52m)     operator-lifecycle-manager  all requirements found, attempting install
  Normal   NeedsReinstall       42m (x4 over 47m)     operator-lifecycle-manager  APIServices not installed
  Normal   InstallSucceeded     42m (x6 over 52m)     operator-lifecycle-manager  waiting for install components to report healthy
  Normal   InstallWaiting       42m (x5 over 52m)     operator-lifecycle-manager  APIServices not installed
  Warning  InstallCheckFailed   2m17s (x19 over 47m)  operator-lifecycle-manager  install timeout

kl -n olm olm-operator-9bb87877b-zmr7f --tail=50 -f

time="2021-12-08T20:17:00Z" level=info msg="install strategy successful" csv=packageserver id=bVi6s namespace=olm phase=Installing strategy=deployment
time="2021-12-08T20:17:00Z" level=info msg="install strategy successful" csv=packageserver id=Su0tW namespace=olm phase=Installing strategy=deployment
time="2021-12-08T20:17:01Z" level=warning msg="install timed out" csv=packageserver id=AzvRi namespace=olm phase=Installing
I1208 20:17:01.095196       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"olm", Name:"packageserver", UID:"36910f88-c948-4d33-91e9-940ba5361b1f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"147367595", FieldPath:""}): type: 'Warning' reason: 'InstallCheckFailed' install timeout
time="2021-12-08T20:17:01Z" level=warning msg="install timed out" csv=packageserver id=eYZNN namespace=olm phase=Installing
I1208 20:17:01.250682       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"olm", Name:"packageserver", UID:"36910f88-c948-4d33-91e9-940ba5361b1f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"147367595", FieldPath:""}): type: 'Warning' reason: 'InstallCheckFailed' install timeout
time="2021-12-08T20:17:01Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" csv=packageserver id=SldDi namespace=olm phase=Installing
E1208 20:17:01.267190       1 queueinformer_operator.go:290] sync {"update" "olm/packageserver"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again
time="2021-12-08T20:17:01Z" level=warning msg="needs reinstall: APIServices not installed" csv=packageserver id=kLQmP namespace=olm phase=Failed strategy=deployment
I1208 20:17:01.445074       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"olm", Name:"packageserver", UID:"36910f88-c948-4d33-91e9-940ba5361b1f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"147369389", FieldPath:""}): type: 'Normal' reason: 'NeedsReinstall' APIServices not installed
time="2021-12-08T20:17:01Z" level=warning msg="needs reinstall: APIServices not installed" csv=packageserver id=owVzw namespace=olm phase=Failed strategy=deployment
I1208 20:17:01.646871       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"olm", Name:"packageserver", UID:"36910f88-c948-4d33-91e9-940ba5361b1f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"147369389", FieldPath:""}): type: 'Normal' reason: 'NeedsReinstall' APIServices not installed
time="2021-12-08T20:17:01Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" csv=packageserver id=ePxEX namespace=olm phase=Failed
E1208 20:17:01.655655       1 queueinformer_operator.go:290] sync {"update" "olm/packageserver"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again
time="2021-12-08T20:17:01Z" level=info msg="scheduling ClusterServiceVersion for install" csv=packageserver id=4rhbL namespace=olm phase=Pending
I1208 20:17:01.748610       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"olm", Name:"packageserver", UID:"36910f88-c948-4d33-91e9-940ba5361b1f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"147369396", FieldPath:""}): type: 'Normal' reason: 'AllRequirementsMet' all requirements found, attempting install
time="2021-12-08T20:17:01Z" level=info msg="scheduling ClusterServiceVersion for install" csv=packageserver id=h2fyb namespace=olm phase=Pending
I1208 20:17:01.844415       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"olm", Name:"packageserver", UID:"36910f88-c948-4d33-91e9-940ba5361b1f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"147369396", FieldPath:""}): type: 'Normal' reason: 'AllRequirementsMet' all requirements found, attempting install
time="2021-12-08T20:17:01Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" csv=packageserver id=q0q+v namespace=olm phase=Pending
E1208 20:17:01.852175       1 queueinformer_operator.go:290] sync {"update" "olm/packageserver"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again
time="2021-12-08T20:17:01Z" level=warning msg="reusing existing cert packageserver-service-cert"
I1208 20:17:02.010812       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"olm", Name:"packageserver", UID:"36910f88-c948-4d33-91e9-940ba5361b1f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"147369400", FieldPath:""}): type: 'Normal' reason: 'InstallSucceeded' waiting for install components to report healthy
time="2021-12-08T20:17:02Z" level=warning msg="reusing existing cert packageserver-service-cert"
I1208 20:17:02.191973       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"olm", Name:"packageserver", UID:"36910f88-c948-4d33-91e9-940ba5361b1f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"147369400", FieldPath:""}): type: 'Normal' reason: 'InstallSucceeded' waiting for install components to report healthy
time="2021-12-08T20:17:02Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" csv=packageserver id=FSdCV namespace=olm phase=InstallReady
E1208 20:17:02.202441       1 queueinformer_operator.go:290] sync {"update" "olm/packageserver"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again
time="2021-12-08T20:17:02Z" level=info msg="install strategy successful" csv=packageserver id=UY7It namespace=olm phase=Installing strategy=deployment
I1208 20:17:02.221591       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"olm", Name:"packageserver", UID:"36910f88-c948-4d33-91e9-940ba5361b1f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"147369412", FieldPath:""}): type: 'Normal' reason: 'InstallWaiting' APIServices not installed
time="2021-12-08T20:17:02Z" level=info msg="install strategy successful" csv=packageserver id=PqBQQ namespace=olm phase=Installing strategy=deployment
I1208 20:17:02.248508       1 event.go:282] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"olm", Name:"packageserver", UID:"36910f88-c948-4d33-91e9-940ba5361b1f", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"147369412", FieldPath:""}): type: 'Normal' reason: 'InstallWaiting' APIServices not installed
time="2021-12-08T20:17:02Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver\": the object has been modified; please apply your changes to the latest version and try again" csv=packageserver id=kebtg namespace=olm phase=Installing
E1208 20:17:02.258585       1 queueinformer_operator.go:290] sync {"update" "olm/packageserver"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver": the object has been modified; please apply your changes to the latest version and try again
time="2021-12-08T20:17:02Z" level=info msg="install strategy successful" csv=packageserver id=x5s4G namespace=olm phase=Installing strategy=deployment

Environment

Istio 1.11.3

kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T21:16:14Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.7-eks-d88609", GitCommit:"d886092805d5cc3a47ed5cf0c43de38ce442dfcb", GitTreeState:"clean", BuildDate:"2021-07-31T00:29:12Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

Additional context Add any other context about the problem here.

Possibly Related https://github.com/operator-framework/operator-lifecycle-manager/issues/1368 https://github.com/operator-framework/operator-lifecycle-manager/issues/2343 https://github.com/operator-framework/operator-lifecycle-manager/issues/2234

dinhxuanvu commented 2 years ago

Hi there, Unfortunately, there is not enough information on the description to identify what the root cause is. I would like to ask for the pod/deployment yaml blob for the packageserver on the olm namespace.

martin31821 commented 1 year ago

Hey, had a similar issue using cilium, pushed a fix into my fork. Problem essentially is that EKS can't connect to the packageserver API, which I've solved by using hostNetwork on these pods.: https://github.com/operator-framework/operator-lifecycle-manager/compare/master...deinstapel:operator-lifecycle-manager:v0.23.1-fix