sassoftware / viya4-deployment

This project contains Ansible code that creates a baseline in an existing Kubernetes environment for use with the SAS Viya Platform, generates the manifest for an order, and then can also deploy that order into the Kubernetes environment specified.
Apache License 2.0
70 stars 64 forks source link

fix: (IAC-1287) viya4-deployment appears to deploy SAS Viya Monitoring for Kubernetes twice when the tags include "cluster-logging,cluster-monitoring,viya-monitoring" and 'install'' #166

Closed dwstern closed 7 months ago

dwstern commented 2 years ago

I just deployed Viya 4 using this Docker command (after first deploying just the baseline components):

# Deploy Monioring and Logging tools
docker container run -it \
    --group-add root \
    --user $(id -u):$(id -g) \
    -v $HOME/jumphost/workspace/deploy:/data \
    -v $HOME/jumphost/workspace/.kubeconfig_aks:/config/kubeconfig \
    -v $HOME/jumphost/workspace/gel02-vars.yaml:/config/config \
    -v $HOME/jumphost/workspace/gel02-vars.tfstate:/config/tfstate \
    viya4-deployment \
    --tags "baseline,viya,cluster-logging,cluster-monitoring,viya-monitoring,install"

Going by the output to stdout, attached here: viya4-deployment.txt, this deployed Viya successfully, but it appears to have deployed the SAS Viya Monitoring for Kubernetes components twice. I got two sets of passwords in the output (though not adjacent to each other, as shown here for brevity):

        "Grafana - username: admin, password: p1WXEKig3GGLMzJcvHkb"
        "Kibana admin  - username: admin,        password: 59LQvYC4VjD80BtsKboH",
        "Kibana Server - username: kibanaserver, password: C06Enzx9tpjywkm4PApz",
        "Log Collector - username: logcollector, password: qGVACyECJ2Lai3KPa7Ph",
        "Metric Getter - username: metricgetter, password: yqh6aol94HcmLS6c2o61"

        "Grafana - username: admin, password: z8EraQDnZpusAlC7nO46"
        "Kibana admin  - username: admin,        password: eIr4LItiwkMTafStv9Px",
        "Kibana Server - username: kibanaserver, password: NWY9vt7f0aWd70NEAsHk",
        "Log Collector - username: logcollector, password: iBJRRQ4ScdebfJGdhznq",
        "Metric Getter - username: metricgetter, password: pD53BB7zv0Xdf429S4yn"

Since these components take quite a long time to deploy - the scripts that deploy them include some multi-minute waits while things start up, deploying them twice results in a working instance of them, but is wasteful of time and resources. Also, only one set of those passwords - presumably the second set - will work, and someone seeing the first set might think they had a problem or the passwords were wrong!

If this is confirmed as an issue, please can we deploy those components just once, when the tags include "cluster-logging,cluster-monitoring,viya-monitoring"?

andybouts commented 2 years ago

Adding my comments ... I have experienced rather long deployment times and it's almost always that the monitoring deployment takes what seems like 5-6x longer than another other step in the deployment.

thpang commented 2 years ago

@andybouts , if there are issue with the monitoring and logging please create a GitHub issue here https://github.com/sassoftware/viya4-monitoring-kubernetes

andybouts commented 2 years ago

Thanks @thpang Submitted, https://github.com/sassoftware/viya4-monitoring-kubernetes/issues/230

dwstern commented 2 years ago

Just to be clear, I'm not complaining about the fact that deploying the logging and monitoring project (viya4-monitoring-kubernetes) takes a few minutes. That's by design, and @thpang is correct that if anyone wanted to discuss that it would be a conversation to have with the developers of that project. My issue is that viya4-deployment appears to be doing it twice. Hopefully we can change something and just do it once?

thpang commented 2 years ago

If you can provide the task names and log info for the above examples that would help. No context as they are I have not idea if it's the same task or a different task. Thx.

dwstern commented 2 years ago

Hi @thpang , sure thing - these are from the viya4-deployment.txt attached to the initial comment:

TASK [monitoring : cluster-monitoring - output credentials] **********************************************************************************************************************
ok: [localhost] => {
    "msg": [
        "Grafana - username: admin, password: p1WXEKig3GGLMzJcvHkb"
    ]
}
Thursday 11 November 2021  15:28:58 +0000 (0:00:00.075)       0:00:29.159 *****
TASK [monitoring : cluster-logging - output credentials] *************************************************************************************************************************
ok: [localhost] => {
    "msg": [
        "Kibana admin  - username: admin,        password: 59LQvYC4VjD80BtsKboH",
        "Kibana Server - username: kibanaserver, password: C06Enzx9tpjywkm4PApz",
        "Log Collector - username: logcollector, password: qGVACyECJ2Lai3KPa7Ph",
        "Metric Getter - username: metricgetter, password: yqh6aol94HcmLS6c2o61"
    ]
}
Thursday 11 November 2021  15:31:58 +0000 (0:00:00.098)       0:03:28.935 *****
TASK [monitoring : cluster-monitoring - output credentials] **********************************************************************************************************************
ok: [localhost] => {
    "msg": [
        "Grafana - username: admin, password: z8EraQDnZpusAlC7nO46"
    ]
}
Thursday 11 November 2021  15:44:27 +0000 (0:00:00.045)       0:15:58.067 *****
TASK [monitoring : cluster-logging - output credentials] *************************************************************************************************************************
ok: [localhost] => {
    "msg": [
        "Kibana admin  - username: admin,        password: eIr4LItiwkMTafStv9Px",
        "Kibana Server - username: kibanaserver, password: NWY9vt7f0aWd70NEAsHk",
        "Log Collector - username: logcollector, password: iBJRRQ4ScdebfJGdhznq",
        "Metric Getter - username: metricgetter, password: pD53BB7zv0Xdf429S4yn"
    ]
}
Thursday 11 November 2021  15:47:18 +0000 (0:00:00.048)       0:18:48.920 *****

The two Ansible tasks look duplicated to me - as if they're being run twice. Thanks!

andybouts commented 2 years ago

And if I can reading this correctly, the task TASK [monitoring : cluster-monitoring seems to take +/- 3 mins each time it runs.

That correlates to my log I provided where that task took +/- 450 to 500 secs in total.

Does this need to be reported in the monitoring project as an issue?


Thursday 18 November 2021 17:56:34 +0000 (0:00:00.281) 0:12:41.187 *****
===============================================================================
monitoring : cluster-logging - deploy --------------------------------- 467.78s
vdm : manifest - deploy ------------------------------------------------ 90.88s
vdm : kustomize - Generate deployment manifest ------------------------- 47.23s
baseline : Deploy ingress-nginx ---------------------------------------- 29.11s
baseline : Deploy cert-manager ----------------------------------------- 22.58s
baseline : Deploy nfs-subdir-external-provisioner ---------------------- 17.38s
vdm : manifest - deploy update ----------------------------------------- 10.95s
vdm : prereqs - cluster-local deploy ----------------------------------- 10.93s
vdm : assets - Download ------------------------------------------------ 10.75s
vdm : prereqs - cluster-wide -------------------------------------------- 6.45s
vdm : prereqs - cluster-api --------------------------------------------- 4.36s
vdm : copy - VDM generators --------------------------------------------- 2.89s
vdm : copy - VDM transformers ------------------------------------------- 2.89s
vdm : assets - Get License ---------------------------------------------- 1.84s
monitoring : v4m - download --------------------------------------------- 1.40s
vdm : assets - Extract downloaded assets -------------------------------- 1.27s
jump-server : jump-server - create folders ------------------------------ 1.21s
jump-server : jump-server - lookup groups ------------------------------- 1.09s
baseline : Remove deprecated efs-provisioner namespace ------------------ 1.09s
Gathering Facts --------------------------------------------------------- 1.08s
thpang commented 2 years ago

Sure, this is not the place for that information.

dwstern commented 2 years ago

Yes, @andybouts, the viya4-deployment project is simply calling the viya4-monitoring-kubernetes project at that point, and has no way to influence how long that project's install step takes. Let's not confuse this issue, if you don't mind. This one is about unnecessarily duplicating the monitoring+logging install.

sayeun commented 9 months ago

An internal ticket was opened to investigate this.

jarpat commented 7 months ago

I believe this is no longer an issue. After running viya4-deployment with the "baseline,viya,cluster-logging,cluster-monitoring,viya-monitoring,install" tags I did not see ansible tasks repeating. I saw two separate tasks that were run "monitoring : cluster-monitoring - output credentials" and "monitoring : cluster-logging - output credentials", which displays the credentials for monitoring and logging respectively.

...truncated
2024-02-07 18:10:49,524 p=1 u=viya4-deployment n=ansible | TASK [monitoring : cluster-monitoring - output credentials] ********************
2024-02-07 18:10:49,524 p=1 u=viya4-deployment n=ansible | ok: [localhost] => 
  msg:
  - 'Grafana - username: admin, password: *******'
2024-02-07 18:10:49,536 p=1 u=viya4-deployment n=ansible | Wednesday 07 February 2024  18:10:49 +0000 (0:00:00.024)       0:00:30.803 **** 
...truncated
2024-02-07 18:13:49,518 p=1 u=viya4-deployment n=ansible | TASK [monitoring : cluster-logging - output credentials] ***********************
2024-02-07 18:13:49,518 p=1 u=viya4-deployment n=ansible | ok: [localhost] => 
  msg:
  - 'OpenSearch admin  - username: admin,                   password: *******'
  - 'OpenSearch admin  - username: logadm,                  password: *******'
2024-02-07 18:13:49,525 p=1 u=viya4-deployment n=ansible | Wednesday 07 February 2024  18:13:49 +0000 (0:00:00.018)       0:03:30.792 **** 
...truncated

If you run into this issue again please open another GitHub issue.