sassoftware / viya4-deployment

This project contains Ansible code that creates a baseline in an existing Kubernetes environment for use with the SAS Viya Platform, generates the manifest for an order, and then can also deploy that order into the Kubernetes environment specified.
Apache License 2.0
70 stars 64 forks source link

Error deploying the Cluster-Monitoring #94

Closed CHoldinghausen closed 3 years ago

CHoldinghausen commented 3 years ago

Hello Experts,

I used the viya4-iac to deploy the infrastructure and now im using the viya4-deployment to deploy the whole new 2020.1.5 SAS Version. Unfurtunately i got an Error during the Deployment of the monitoring parts.

I think this tasks run into an timeout but im not 100% sure. Can you please help me out here?

`TASK [monitoring : cluster-monitoring - deploy] **** fatal: [localhost]: FAILED! => {"changed": true, "cmd": "/tmp/ansible._q1dqpdw/viya4-monitoring-kubernetes/monitoring/bin/deploy_monitoring_cluster.sh", "delta": "0:20:39.446237", "end": "2021-05-18 13:33:11.933409", "msg": "non-zero return code", "rc": 1, "start": "2021-05-18 13:12:32.487172", "stderr": "Error: release v4m-prometheus-operator failed, and has been uninstalled due to atomic being set: timed out waiting for the condition", "stderr_lines": ["Error: release v4m-prometheus-operator failed, and has been uninstalled due to atomic being set: timed out waiting for the condition"], "stdout": "Helm client version: 3.5.4\nKubernetes client version: v1.18.8\nKubernetes server version: v1.18.14\n\nDeploying monitoring to the [monitoring] namespace...\nAdding [stable] helm repository\n\"stable\" has been added to your repositories\nAdding [prometheus-community] helm repository\n\"prometheus-community\" has been added to your repositories\nUpdating helm repositories...\nHang tight while we grab the latest from your chart repositories...\n...Successfully got an update from the \"prometheus-community\" chart repository\n...Successfully got an update from the \"stable\" chart repository\nUpdate Complete. ⎈Happy Helming!⎈\nUpdating Prometheus Operator custom resource definitions\ncustomresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com configured\ncustomresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com configured\ncustomresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com configured\ncustomresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com configured\ncustomresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com configured\ncustomresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com configured\ncustomresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com configured\ncustomresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com configured\nProvisioning TLS-enabled Prometheus datasource for Grafana...\nconfigmap \"grafana-datasource-prom-https\" deleted\nconfigmap/grafana-datasource-prom-https created\nconfigmap/grafana-datasource-prom-https labeled\nEnabling Prometheus node-exporter for TLS...\nconfigmap \"node-exporter-tls-web-config\" deleted\nconfigmap/node-exporter-tls-web-config created\nconfigmap/node-exporter-tls-web-config labeled\nUser response file: [/tmp/ansible._q1dqpdw/monitoring/user-values-prom-operator.yaml]\nDeploying the Kube Prometheus Stack. This may take a few minutes...\nInstalling via Helm...(Tue May 18 13:12:48 UTC 2021 - timeout 20m)\nRelease \"v4m-prometheus-operator\" does not exist. Installing it now.", "stdout_lines": ["Helm client version: 3.5.4", "Kubernetes client version: v1.18.8", "Kubernetes server version: v1.18.14", "", "Deploying monitoring to the [monitoring] namespace...", "Adding [stable] helm repository", "\"stable\" has been added to your repositories", "Adding [prometheus-community] helm repository", "\"prometheus-community\" has been added to your repositories", "Updating helm repositories...", "Hang tight while we grab the latest from your chart repositories...", "...Successfully got an update from the \"prometheus-community\" chart repository", "...Successfully got an update from the \"stable\" chart repository", "Update Complete. ⎈Happy Helming!⎈", "Updating Prometheus Operator custom resource definitions", "customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com configured", "customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com configured", "customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com configured", "customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com configured", "customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com configured", "customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com configured", "customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com configured", "customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com configured", "Provisioning TLS-enabled Prometheus datasource for Grafana...", "configmap \"grafana-datasource-prom-https\" deleted", "configmap/grafana-datasource-prom-https created", "configmap/grafana-datasource-prom-https labeled", "Enabling Prometheus node-exporter for TLS...", "configmap \"node-exporter-tls-web-config\" deleted", "configmap/node-exporter-tls-web-config created", "configmap/node-exporter-tls-web-config labeled", "User response file: [/tmp/ansible._q1dqpdw/monitoring/user-values-prom-operator.yaml]", "Deploying the Kube Prometheus Stack. This may take a few minutes...", "Installing via Helm...(Tue May 18 13:12:48 UTC 2021 - timeout 20m)", "Release \"v4m-prometheus-operator\" does not exist. Installing it now."]}

PLAY RECAP ***** localhost : ok=89 changed=24 unreachable=0 failed=1 skipped=45 rescued=0 ignored=0

Tuesday 18 May 2021 13:33:11 +0000 (0:20:39.599) 0:23:05.642 ***

monitoring : cluster-monitoring - deploy ----------------------------- 1239.60s vdm : manifest - deploy ------------------------------------------------ 75.55s vdm : kustomize - Generate deployment manifest ------------------------- 28.93s vdm : prereqs - cluster-local deploy ------------------------------------ 4.82s vdm : prereqs - cluster-wide -------------------------------------------- 4.21s vdm : copy - VDM generators --------------------------------------------- 3.09s vdm : assets - Download ------------------------------------------------- 2.06s vdm : assets - Get License ---------------------------------------------- 1.97s vdm : copy - VDM transformers ------------------------------------------- 1.91s monitoring : v4m - download --------------------------------------------- 1.80s vdm : Download viya4-orders-cli ----------------------------------------- 1.25s nfs-subdir-external-provisioner : Deploy nfs-subdir-external-provisioner --- 1.18s cert-manager : Deploy cert-manager -------------------------------------- 1.00s vdm : assets - Extract downloaded assets -------------------------------- 0.86s nfs-subdir-external-provisioner : Remove deprecated efs-provisioner namespace --- 0.82s metrics-server : Check for metrics service ------------------------------ 0.80s jump-server : jump-server - lookup groups ------------------------------- 0.76s vdm : Create namespace -------------------------------------------------- 0.73s monitoring : cluster-monitoring - lookup creds -------------------------- 0.71s Gathering Facts --------------------------------------------------------- 0.71s ` Best Regards Carsten

CHoldinghausen commented 3 years ago

After running it some times it just finished