operator-framework / operator-registry

Operator Registry runs in a Kubernetes or OpenShift cluster to provide operator catalog data to Operator Lifecycle Manager.
Apache License 2.0
212 stars 247 forks source link

Bundle Unpacker is not idempotent #926

Open cdjohnson opened 2 years ago

cdjohnson commented 2 years ago

Problem: On OpenShift 4.9.17, A customer turned on a ResourceQuota on the openshift-marketplace namespace which is the main namespace for OLM.

kind: ResourceQuota
apiVersion: v1
metadata:
  name: temp-pod-quota
  namespace: openshift-marketplace
spec:
  hard:
    pods: '25'
status:
  hard:
    pods: '25'
  used:
    pods: '25'

When attempting to install any operator, the quota quickly was exceeded and the unpack jobs were unable to create it's pods.... sometimes: : Error creating: pods "878ecfdea1565e741abd95478cf86a233a15cc3f7969992e14d6e8--1-b9nvz" is forbidden: exceeded quota: temp-pod-quota, requested: pods=1, used: pods=25, limited: pods=25 Error creating: pods "878ecfdea1565e741abd95478cf86a233a15cc3f7969992e14d6e8--1-bfql6" is forbidden: exceeded quota: mch-temp-pod-quota, requested: pods=1, used: pods=25, limited: pods=25 ... More of these events every minute.

Some of the pods would get created and run successfully, some would fail to be created and some would be created and fail.

A few successful ones look like this: 878ecfdea1565e741abd95478cf86a233a15cc3f7969992e14d6e8--1-5dnj8

time="2022-03-02T13:35:45Z" level=info msg="Using in-cluster kube client config"
time="2022-03-02T13:35:45Z" level=info msg="Reading file" file=/bundle/manifests/ibm-bedrock-version.yaml
time="2022-03-02T13:35:45Z" level=info msg="Reading file" file=/bundle/manifests/ibm-common-service-operator.clusterserviceversion.yaml
time="2022-03-02T13:35:45Z" level=info msg="Reading file" file=/bundle/manifests/operator.ibm.com_commonservices.yaml
time="2022-03-02T13:35:45Z" level=info msg="Reading file" file=/bundle/metadata/annotations.yaml

The failures look like this: 878ecfdea1565e741abd95478cf86a233a15cc3f7969992e14d6e8--1-5bjdt (crashloopbackoff)

time="2022-03-02T19:47:05Z" level=info msg="Using in-cluster kube client config"
time="2022-03-02T19:47:05Z" level=info msg="Reading file" file=/bundle/manifests/ibm-bedrock-version.yaml
time="2022-03-02T19:47:05Z" level=info msg="Reading file" file=/bundle/manifests/ibm-common-service-operator.clusterserviceversion.yaml
time="2022-03-02T19:47:05Z" level=info msg="Reading file" file=/bundle/manifests/operator.ibm.com_commonservices.yaml
time="2022-03-02T19:47:05Z" level=info msg="Reading file" file=/bundle/metadata/annotations.yaml
Error: error loading manifests from directory: ConfigMap "878ecfdea1565e741abd95478cf86a233a15cc3f7969992e14d6e82603883eb" is invalid: [data[ibm-bedrock-version.yaml]: Invalid value: "ibm-bedrock-version.yaml": duplicate of key present in binaryData, data[ibm-common-service-operator.clusterserviceversion.yaml]: Invalid value: "ibm-common-service-operator.clusterserviceversion.yaml": duplicate of key present in binaryData, data[operator.ibm.com_commonservices.yaml]: Invalid value: "operator.ibm.com_commonservices.yaml": duplicate of key present in binaryData]
Usage:
  opm alpha bundle extract [flags]

Flags:
  -c, --configmapname string   name of configmap to write bundle data
  -l, --datalimit uint         maximum limit in bytes for total bundle data (default 1048576)
      --debug                  enable debug logging
  -h, --help                   help for extract
  -k, --kubeconfig string      absolute path to kubeconfig file
  -m, --manifestsdir string    path to directory containing manifests (default "/")
  -n, --namespace string       namespace to write configmap data (default "openshift-operator-lifecycle-manager")

Global Flags:
      --skip-tls   skip TLS certificate verification for container image registries while pulling bundles or index

Summery: It seems like either the bundle unpack job is not idempotent. If the ConfigMap already contains the keys, it should just move-on, or replace the values rather than fail with a duplicate key error.

exdx commented 2 years ago

This looks to be an edge-case in the unpacker job that is used to unpack content from a catalog onto the cluster via a ConfigMap. Ideally the unpack jobs would be idempotent and not conflict with one another.