stackabletech / stackable-cockpit

Home of stackable-cockpit, stackablectl and stackable-cockpitd
https://docs.stackable.tech/management/stable/
Other
6 stars 2 forks source link

stackablectl: Retry GVK resolution #293

Closed sbernauer closed 3 months ago

sbernauer commented 3 months ago

Affected version

main

Current and expected behavior

When installing a stack that install a new CRD and afterwards tries to create an object of that new kind (such as https://github.com/stackabletech/demos/pull/35) we fail with the following error message

  DEBUG  Installing YAML manifest from stacks/observability/opentelemetry-collector-sidecar.yaml
    at src/platform/manifests.rs:125
    in install_manifests
    in prepare_manifests
    in install with install_parameters: StackInstallParameters { demo_name: None, stack_name: "observability", operator_namespace: "stackable-operators", product_namespace: "default", parameters: [], skip_release: false, labels: Labels(KeyValuePairs({KeyValuePair { key: Key { prefix: Some(KeyPrefix("stackable.tech")), name: KeyName("managed-by") }, value: LabelValue("stackablectl") }, KeyValuePair { key: Key { prefix: Some(KeyPrefix("stackable.tech")), name: KeyName("stack") }, value: LabelValue("observability") }, KeyValuePair { key: Key { prefix: Some(KeyPrefix("stackable.tech")), name: KeyName("vendor") }, value: LabelValue("Stackable") }})) }
    in install_cmd with args: StackInstallArgs { stack_name: "observability", skip_release: false, stack_parameters: [], parameters: [], local_cluster: CommonClusterArgs { cluster_type: None, cluster_name: "stackable-data-platform", cluster_nodes: 2, cluster_cp_nodes: 1 }, namespaces: CommonNamespaceArgs { operator_namespace: "stackable-operators", product_namespace: "default" } }
    in run with self: Cli { log_level: Some(Level(Debug)), no_cache: false, offline: false, files: CommonFileArgs { demo_files: [], stack_files: ["stacks/stacks-v2.yaml"], release_files: [] }, repos: CommonRepoArgs { helm_repo_stable: "https://repo.stackable.tech/repository/helm-stable/", helm_repo_test: "https://repo.stackable.tech/repository/helm-test/", helm_repo_dev: "https://repo.stackable.tech/repository/helm-dev/" }, subcommand: Stack(StackArgs { subcommand: Install(StackInstallArgs { stack_name: "observability", skip_release: false, stack_parameters: [], parameters: [], local_cluster: CommonClusterArgs { cluster_type: None, cluster_name: "stackable-data-platform", cluster_nodes: 2, cluster_cp_nodes: 1 }, namespaces: CommonNamespaceArgs { operator_namespace: "stackable-operators", product_namespace: "default" } }) }) }

An unrecoverable error occured: failed to execute stack (sub)command

Caused by these errors (recent errors listed first):
 1: failed to install stack "observability"
 2: failed to install stack manifests
 3: failed to deploy manifests using the kube client
 4: failed to deploy manifest because GVK GroupVersionKind { group

This is proably because we cache GVK here https://github.com/stackabletech/stackable-cockpit/blob/3de66fc0bf1d2bf06e7744ba68070454e0b89693/rust/stackable-cockpit/src/utils/k8s/client.rs#L80-L83

Possible solution

Once we run into an GVK error we should run Discovery::run and give it another try.

However I'm 100% confident, this needs to be tested

Additional context

No response

Environment

No response

Would you like to work on fixing this bug?

None

NickLarsenNZ commented 3 months ago

Related: We need to find out why the who error isn't being printed in the Snafu report.