Closed DrummyFloyd closed 2 months ago
Thanks for submitting this issue!
There are two things that cause this bug:
We used tar + gz to create some of these resources, and that might be necessary here... unless even that is too big.
We have a couple options to choose from (thanks @varshaprasad96 and @komish! ):
Split the impl into 2 charts - crds and main, with a dependency between them and owner refs determining correct deletion sequence (until we implement a Finalizer that takes care of it). Pros: a best practice by helm team Cons: possibly more edge cases ?
Compress manifests as @tmshort suggested. Pros: should take care of this issue and similar ones. Cons: there could be cases where it's too big even after compression. would require implementing a custom secret encode and decode mechanism, instead of using Helm's default one.
Use a different storage backend e.g. SQL https://helm.sh/docs/topics/advanced/#storage-backends Pros: size is not an issue. Cons: requires spinning up and maintaining a PG instance.
For (2) we do that for the test registry: https://github.com/operator-framework/operator-controller/blob/bfd41423e48c6fb9ca6567d81ab1904d1a1910c2/testdata/bundles/registry-v1/build-push-e2e-bundle.sh#L34-L37
This seems really strange to me that we are offloading CRD creation/management to helm
when we know that helm
itself can't manage the install/update of CRD's. I was under the impression (though it appears wrong), that OLMv1 was going to mange the creation of CRD's outside the context (knowledge) of helm
. It seems that this was only in the context of namespace scoped
installs, but even still that seems illogical, that we'd have two different control paths/flows within OLMv1.
Should we possibly look into a 4th option of managing the CRD's ourselves, or is that out of scope(effort/timeline)?
Adding some thoughts based on initial findings and discussion with @itroyano:
Option 3: The options available to us in terms of possible supported Helm backends are: Secrets, Configmaps and SQL. The first two don't solve the problem, they have similar size limits. The last one - is in beta, as well as maintaining a separate component is an unnecessary added pain.
Option 2: Compression is a good idea, and helm by default does an encoding and its hardcoded in its implementation. But there is a small caveat here - I am not sure if compressing it twice will enable helm to read secret data. The implementation is in here, and compressing it right way, would cause issues when helm tries to decode and read it (I haven't tried it, but looks so) - eg: https://github.com/helm/helm/blob/1a500d5625419a524fdae4b33de351cc4f58ec35/pkg/storage/driver/secrets.go#L96.
The reason it works in the e2e, I think, is because Kaniko uncompresses the tarball from its context. Based on a quick glance, when we provide a tarball as the context for Kaniko, it extracts the contents of the tarball and then proceeds to build the Docker image using the extracted files. Which means the manifests that are passed for the chart's creation are still uncompressed.
The other option here, as alluded by @joelanford was to reimplement the whole secret driver - which is the code available here: https://github.com/helm/helm/blob/1a500d5625419a524fdae4b33de351cc4f58ec35/pkg/storage/driver/secrets.go. Re-implementing an additional compression, or even creating shards would probably take more effort for us as well as maintaining it could be an additional problem.
Option 1:
Helm by default does not manage the lifecycle of CRDs. If the CRDs are stored in a separate crd/
folder, then they are applied before creating a chart, and is hence not a part of the release (ref: https://github.com/helm/helm/blob/1a500d5625419a524fdae4b33de351cc4f58ec35/pkg/action/install.go#L254-L263). The major benefit of using OLM v1 is the exact solution to this problem, and the intentional reason afaik to include it as a part of release, is so that we implement our own set of CRDUpgradeSafety
checks, and let Helm handle CRDs as it would do with any other manifest. If we need to separate CRDs from the rest of the manifests, there are 2 things we could do:
a. Handle CRDs on our own - with a kubectl apply b. Create a Helm chart that contains CRDs and the main chart containing manifests is marked as a dependent to the CRD chart. (This is one of the popular solutions: https://github.com/cloudnative-pg/charts/issues/280#issuecomment-2135939905)
Implementing (a) and (b) are synonymous imo. The only thing is - there shouldn't be an edge case, where CRDs themselves exceed the allowable size of Helm. I'm not sure if that is even a best practice (with other practical concerns that such huge CRDs have on performance, caching, (probably etcd limits) etc).
Both of these methods - which make us manage CRDs separately than manifests, bring in two concerns:
Both options (2) and (3) (ie reimplementing secret driver or separating out CRDs in another chart) come with their own maintenance challenges. The decision is to choose the one that is easier for us to implement and manage.
I tinkered on this today and came up with a custom driver in helm-operator-plugins that:
ChunkSize
provided to the driver), and manages an ordered set of secrets for a particular key.In theory, it could handle up to MaxUint64
chunks, but we would probably want to cap it (maybe at 10?) to have some control of a supportable upper bound.
@DrummyFloyd I have #1057 up as a possible solution to this issue if you are interested in checking it out!
@DrummyFloyd I have #1057 up as a possible solution to this issue if you are interested in checking it out!
│ Status: │
│ Conditions: │
│ Last Transition Time: 2024-07-17T21:28:50Z │
│ Message: resolved to "quay.io/operatorhubio/mariadb-operator@sha256:a96b0c89a9cfd307aee2e56b32ce51c2428e19b21e1dbe8f38386956ee73a618" │
│ Observed Generation: 1 │
│ Reason: Success │
│ Status: True │
│ Type: Resolved │
│ Last Transition Time: 2024-07-17T21:28:56Z │
│ Message: Instantiated bundle op-mariadb successfully │
│ Observed Generation: 1 │
│ Reason: Success │
│ Status: True │
│ Type: Installed │
│ Last Transition Time: 2024-07-17T21:28:39Z │
│ Message: │
│ Observed Generation: 1 │
│ Reason: Deprecated │
│ Status: False │
│ Type: Deprecated │
│ Last Transition Time: 2024-07-17T21:28:39Z │
│ Message: │
│ Observed Generation: 1 │
│ Reason: Deprecated │
│ Status: False │
│ Type: PackageDeprecated │
│ Last Transition Time: 2024-07-17T21:28:39Z │
│ Message: │
│ Observed Generation: 1 │
│ Reason: Deprecated │
│ Status: False │
│ Type: ChannelDeprecated │
│ Last Transition Time: 2024-07-17T21:28:39Z │
│ Message: │
│ Observed Generation: 1 │
│ Reason: Deprecated │
│ Status: False │
│ Type: BundleDeprecated │
│ Last Transition Time: 2024-07-17T21:28:54Z │
│ Message: unpack successful: │
│ Observed Generation: 1 │
│ Reason: UnpackSuccess │
│ Status: True │
│ Type: Unpacked │
│ Installed Bundle: │
│ Name: mariadb-operator.v0.29.0 │
│ Version: 0.29.0 │
│ Resolved Bundle: │
│ Name: mariadb-operator.v0.29.0 │
│ Version: 0.29.0 │
│ Events: <none> │
│
work =D
EDIT:
dunno, if this solution can also adresse error upon this kind of issue ?
│ Message: InstallationFailed:template: object-779770e631338b63.yaml:124:63: executing "object-779770e631338b63.yaml" at <.data.user>: nil pointer evaluating interface {}.user
if yes , do not hesitate to ask more context =)
@DrummyFloyd this issue should be fixed in main
now.
Do you have a reproducer for the other issue you ran into? If you're still having a problem there, could you open a new issue for that so we don't lose track?
@DrummyFloyd this issue should be fixed in
main
now.Do you have a reproducer for the other issue you ran into? If you're still having a problem there, could you open a new issue for that so we don't lose track?
thank you for the previous Issue =) yes i have a reproducer case => will create ticket tomorrow =)
apiVersion: olm.operatorframework.io/v1alpha1
kind: ClusterExtension
metadata:
name: op-eso
spec:
installNamespace: operators
packageName: external-secrets-operator
version: 0.9.20
serviceAccount:
name: default
---
apiVersion: catalogd.operatorframework.io/v1alpha1
kind: ClusterCatalog
metadata:
name: operatorhubio
spec:
source:
type: image
image:
ref: quay.io/operatorhubio/catalog:latest
pollInterval: 24h
User story
issue created , due to mention in slack
as a recurent tester on this project, ( because i like it) i test upon some wanted operator on my stack.
atm i have some issue with following manifest with
0.10
OLMList if issues
error message rendered.
New content or update?
Net-new content
Types of documentation
No response
References
No response