sassoftware / viya4-monitoring-kubernetes

Provides simple scripts and customization options to deploy monitoring, alerts, and log aggregation for Viya 4 running on Kubernetes
Apache License 2.0
54 stars 32 forks source link

Error: release v4m-prometheus-operator failed #541

Closed staceychan closed 1 year ago

staceychan commented 1 year ago

Hi Experts,

I encountered an error while installing the monitoring with the commend monitoring/bin/deploy_monitoring_cluster.sh

Here is the logs

INFO User directory: /sas/viya_k8/viya4-monitoring-kubernetes
INFO Helm client version: 3.11.1
INFO Kubernetes client version: v1.23.1
INFO Kubernetes server version: v1.23.16
INFO Loading user environment file: /sas/viya_k8/viya4-monitoring-kubernetes/monitoring/user.env

namespace/monitoring created
serviceaccount/default patched
Deploying monitoring to the [monitoring] namespace...
"prometheus-community" already exists with the same configuration, skipping
INFO Updating Prometheus Operator custom resource definitions
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com replaced
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com replaced
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com replaced
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com replaced
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com replaced
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com replaced
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com replaced
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com replaced
No resources found
INFO User response file: [/sas/viya_k8/viya4-monitoring-kubernetes/monitoring/user-values-prom-operator.yaml]
INFO Deploying the kube-prometheus stack. This may take a few minutes ...
INFO Installing via Helm (Mon Aug 21 02:30:28 UTC 2023 - timeout 20m)
Release "v4m-prometheus-operator" does not exist. Installing it now.
Error: release v4m-prometheus-operator failed, and has been uninstalled due to atomic being set: timed out waiting for the condition
ERROR Exiting script [deploy_monitoring_cluster.sh] due to an error executing the command [helm $helmDebug upgrade --install $promRelease --namespace $MON_NS -f monitoring/values-prom-operator.yaml -f $airgapValuesFile -f $istioValuesFile -f $tlsValuesFile -f $airgapTLSValuesFile -f $tlsPromAlertingEndpointFile -f $nodePortValuesFile -f $wnpValuesFile -f $PROM_OPER_USER_YAML -f $tempoDSFile --atomic --timeout 20m --set nameOverride=$promName --set fullnameOverride=$promName --set prometheus-node-exporter.fullnameOverride=$promName-node-exporter --set kube-state-metrics.fullnameOverride=$promName-kube-state-metrics --set grafana.fullnameOverride=$promName-grafana --set grafana.adminPassword="$grafanaPwd" --set prometheus.prometheusSpec.alertingEndpoints[0].namespace="$MON_NS" --version $KUBE_PROM_STACK_CHART_VERSION ${AIRGAP_HELM_REPO}prometheus-community/kube-prometheus-stack].

I have tried many times by removing the whole namespace and re-install, but still got the same error. Please kindly help.

Many thanks