operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.73k stars 544 forks source link

Bundle extract size limit exceeded guidance #1523

Open harveyelsom opened 4 years ago

harveyelsom commented 4 years ago

Type of question

Looking for guidance into an issue blocking the development of an operator. It doesn't seem like a bug in the product or a feature request but it does feel like a serious limitation.

Question

Attempting to install operator through OLM I am faced with this error

time="2020-05-14T13:14:45Z" level=error msg="File with size 62169 exceeded 1048576 limit, aboring" file=/bundle/manifests/crd.yaml
Error: error loading manifests from directory: file crd.yaml bigger than total allowed limit
Usage:
  opm alpha bundle extract [flags]

...

aside from the error message being slightly confusing. It does seem that the combined total of all the manifests is greater than the 1048576 byte and the file mentioned is what pushes it over the limit.

what is the purpose of this size limit?

does it need to be enforced on the combined bundle?

is there any way to get round this error?

given we can cut down the size to below the limit, at the moment there is only one versions metadata within this bundle, when the time comes to add another version will we hit this issue again?

Environment OCP 4.4

benluddy commented 4 years ago

~The extract subcommand provides a flag to override the default size limit:~ (Edit: Sorry, I misread this as an OPM question.) -l, --datalimit uint maximum limit in bytes for total bundle data (default 1048576)

In the process of unpacking operator bundles on-cluster, OLM copies the content of bundle manifests into a ConfigMap -- more on that here: https://github.com/operator-framework/operator-lifecycle-manager/blob/2b93a4bc750a6ca77586f81914567a7582aa7341/doc/design/resolving-bundle-images.md#unpacking.

As I understand it, the default is what it is because individual values in etcd can't be larger than 1MiB (https://github.com/kubernetes/kubernetes/issues/19781#issuecomment-172553264).

ecordell commented 4 years ago

given we can cut down the size to below the limit, at the moment there is only one versions metadata within this bundle, when the time comes to add another version will we hit this issue again?

No, this limit is currently per-bundle.

is there any way to get round this error?

Not at the moment, but if we're hitting this limit we will need to support unpacking across multiple resources so that it can't be hit.

Can you share the manifests you're trying to use? We haven't seen anyone hit this limit yet, and I'm curious if there is some duplication in your bundle that is causing the problem.

dalelane commented 4 years ago

@ecordell We've not released our Operator yet, so we can't share them yet, sorry.

We have a large number of CRDs (over a dozen). Individually none of them are over 1mb (although we do have one unusually large one which is 550kb), but the combined size of all our CRDs and roles does bump us just over 1mb.

I think there is definitely duplication within each CRD, but in the absence of $ref support - https://github.com/kubernetes/kubernetes/issues/62872 I don't know how we can easily avoid that.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tomncooper commented 3 years ago

+1 on this issue. In the upcoming release of Strimzi (0.22) we drop support for Openshift 3.11 and so have now added schema to our CRDs, doubling (or more) their length. This results in us getting the same issue shown above with the initContainer unable to apply the CRDs.

jarrpa commented 3 years ago

+1 We (OCS) just ran into this as a result of one of our components updating to kubebuilder v0.4.1 to generate our operator bundle. Here's the offending PR: https://github.com/openshift/ocs-operator/pull/1144 Currently trying to figure out a workaround.

callmeadi commented 3 years ago

What could be the workaround for this? Not really want to split the operator.

wallrj commented 3 years ago

I'm getting this error when attempting to bundle cert-manager for operatorhub in https://github.com/operator-framework/community-operators/pull/4103

kubectl -n olm  logs 69eb921e5868a606ccbf4c588b0af829f838c007446ae623e255748bdb77h5h
time="2021-06-17T10:19:02Z" level=info msg="Using in-cluster kube client config"
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/acme.cert-manager.io_challenges.yaml
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/acme.cert-manager.io_orders.yaml
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/cert-manager-cainjector_v1_serviceaccount.yaml
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/cert-manager-edit_rbac.authorization.k8s.io_v1_clusterrole.yaml
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/cert-manager-view_rbac.authorization.k8s.io_v1_clusterrole.yaml
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/cert-manager-webhook_v1_service.yaml
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/cert-manager-webhook_v1_serviceaccount.yaml
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/cert-manager.clusterserviceversion.yaml
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/cert-manager.io_certificaterequests.yaml
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/cert-manager.io_certificates.yaml
time="2021-06-17T10:19:02Z" level=info msg="Reading file" file=/bundle/manifests/cert-manager.io_clusterissuers.yaml
time="2021-06-17T10:19:02Z" level=error msg="File with size 692874 exceeded 1048576 limit, aboring" file=/bundle/manifests/cert-manager.io_clusterissuers.yaml
Error: error loading manifests from directory: file cert-manager.io_clusterissuers.yaml bigger than total allowed limit
Usage:
  opm alpha bundle extract [flags]

Flags:
  -c, --configmapname string   name of configmap to write bundle data
  -l, --datalimit uint         maximum limit in bytes for total bundle data (default 1048576)
      --debug                  enable debug logging
  -h, --help                   help for extract
  -k, --kubeconfig string      absolute path to kubeconfig file
  -m, --manifestsdir string    path to directory containing manifests (default "/")
  -n, --namespace string       namespace to write configmap data (default "openshift-operator-lifecycle-manager")

Global Flags:
      --skip-tls   skip TLS certificate verification for container image registries while pulling bundles or index

Bundle created using operator-sdk as follows

curl -sSL https://github.com/jetstack/cert-manager/releases/download/v1.4.0/cert-manager.yaml | operator-sdk generate bundle --package cert-manager --version 1.4.0 --output-dir /tmp/cert-manager-olm-1.4.0 && tree -h /tmp/cert-manager-olm-1.4.0 
Generating bundle version 1.4.0
Generating bundle manifests
Building a ClusterServiceVersion without an existing base
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
WARN[0000] ClusterServiceVersion validation: [OperationFailed] provided API should have an example annotation 
Bundle manifests generated successfully in /tmp/cert-manager-olm-1.4.0
Generating bundle metadata
INFO[0001] Creating bundle.Dockerfile                   
INFO[0001] Creating /tmp/cert-manager-olm-1.4.0/metadata/annotations.yaml 
INFO[0001] Bundle metadata generated suceessfully       
/tmp/cert-manager-olm-1.4.0
├── [  320]  manifests
│   ├── [ 504K]  acme.cert-manager.io_challenges.yaml
│   ├── [  43K]  acme.cert-manager.io_orders.yaml
│   ├── [  375]  cert-manager-cainjector_v1_serviceaccount.yaml
│   ├── [  20K]  cert-manager.clusterserviceversion.yaml
│   ├── [  803]  cert-manager-edit_rbac.authorization.k8s.io_v1_clusterrole.yaml
│   ├── [  43K]  cert-manager.io_certificaterequests.yaml
│   ├── [  89K]  cert-manager.io_certificates.yaml
│   ├── [ 677K]  cert-manager.io_clusterissuers.yaml
│   ├── [ 676K]  cert-manager.io_issuers.yaml
│   ├── [  368]  cert-manager_v1_serviceaccount.yaml
│   ├── [  582]  cert-manager_v1_service.yaml
│   ├── [  785]  cert-manager-view_rbac.authorization.k8s.io_v1_clusterrole.yaml
│   ├── [  363]  cert-manager-webhook_v1_serviceaccount.yaml
│   └── [  567]  cert-manager-webhook_v1_service.yaml
└── [   60]  metadata
    └── [  726]  annotations.yaml

2 directories, 15 files
dmesser commented 3 years ago

@wallrj This size error is due to the fact that OLM relies on etcd to store the bundle content. The 1MB limit of etcd is what's hitting us here. We are looking at potential workaround in parallel to building a new bundle management system that can work around the etcd limitation. Fundamentally this is a Kubernetes API problem because there are some CRDs out there that due to excessive inlining of all kind of standard resource specs alone exceed the 1MB limit. What you could do is strip the the description in those inline'd standard resources specs (e.g. pod specs or similar)

The second error is asking you to provide example JSON manifests for your CRDs in your CSV, see here: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/building-your-csv.md#crd-templates

iam-veeramalla commented 2 years ago

We are running into similar issue and suggested workaround does not apply to us as our single CRD is close to ~850kb without any description fields in it. https://github.com/argoproj/applicationset/blob/master/manifests/crds/argoproj.io_applicationsets.yaml

The total bundle size comes close to ~1.3 MB.

This is a blocker for us.

camilamacedo86 commented 2 years ago

Hi @iam-veeramalla,

What is the version of OLM used by you? Because in this PR it was added a feature to gzip the bundle. That ought to allow you to have ~4MB