kxr / ocpcr

Ansible playbook to generate an HTML cluster overview report for OpenShift 3
GNU General Public License v3.0
27 stars 20 forks source link

Templating issue with report.j2 on Ansible server #1

Closed telnemri closed 4 years ago

telnemri commented 4 years ago

Hi, I'm part of an Enterprise vendor at VFUK currently trying to use this tool to generate health monitoring outputs - I've managed to configure the tool, run it successfully, with email set up, but the final report is not working due to a templating error in generate_report.yaml - it is complaining of report.j2 variable being invalid as below:

{ "hosts": { "localhost": { "_ansible_no_log": false, "action": "template", "changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: Unable to look up a name or access an attribute in template string (<!doctype html>\n<html lang=\"en\">\n \n <meta charset=\"utf-8\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\n \n OpenShift Cluster Overview Report {{report_date}} \n {% include 'includes/styles.j2' %}\n \n \n {% include 'includes/header.j2' %}\n
\n {% include 'includes/system_resources.j2' %}\n {% include 'includes/system_services.j2' %}\n {% include 'includes/node_resources.j2' %}\n {% include 'includes/node_info.j2' %}\n {% include 'includes/node_status.j2' %}\n {% include 'includes/pod_stats.j2' %}\n {% include 'includes/core_pods.j2' %}\n {% include 'includes/etcd_health.j2' %}\n
\n {% include 'includes/footer.j2' %}\n\n\n \n\n \n\n \n\n \n \n\n).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'StrictUndefined' is not iterable" }

kxr commented 4 years ago

Hi, Thank you for reporting this issue. With Jinja templates in ansible, its almost impossible to know where (line number or file) the error is occurring.

In the report/report.j2 template, you will see multiple includes:

        {% include 'includes/system_resources.j2' %}
        {% include 'includes/system_services.j2' %}
        {% include 'includes/node_resources.j2' %}
        {% include 'includes/node_info.j2' %}
        {% include 'includes/node_status.j2' %}
        {% include 'includes/pod_stats.j2' %}
        {% include 'includes/core_pods.j2' %}
        {% include 'includes/etcd_health.j2' %}

Can you try deleting all these lines and try adding them one by one. See which one causes jinja to fail.

telnemri commented 4 years ago

I've narrowed down the issue to the 3 reports below: {% include 'includes/system_resources.j2' %} NOK {% include 'includes/system_services.j2' %} NOK {% include 'includes/node_resources.j2' %} OK {% include 'includes/node_info.j2' %} NOK {% include 'includes/node_status.j2' %} OK {% include 'includes/pod_stats.j2' %} OK {% include 'includes/core_pods.j2' %} OK {% include 'includes/etcd_health.j2' %} OK

Error messages from the errored reports: {% include 'includes/system_resources.j2' %} "hosts": { "localhost": { "_ansible_no_log": false, "action": "template", "changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: Unable to look up a name or access an attribute in template string (<!doctype html>\n<html lang=\"en\">\n \n <meta charset=\"utf-8\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\n \n OpenShift Cluster Overview Report {{ '%Y-%m-%d' | strftime }} \n {% include 'includes/styles.j2' %}\n \n \n {% include 'includes/header.j2' %}\n
\n {% include 'includes/system_resources.j2' %}\n
\n {% include 'includes/footer.j2' %}\n\n\n \n\n \n\n \n\n \n \n\n).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'StrictUndefined' is not iterable"

{% include 'includes/system_services.j2' %}
{ "hosts": { "localhost": { "_ansible_no_log": false, "action": "template", "changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: Unable to look up a name or access an attribute in template string (<!doctype html>\n<html lang=\"en\">\n \n <meta charset=\"utf-8\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\n \n OpenShift Cluster Overview Report {{ '%Y-%m-%d' | strftime }} \n {% include 'includes/styles.j2' %}\n \n \n {% include 'includes/header.j2' %}\n
\n\t{% include 'includes/system_services.j2' %}\n
\n {% include 'includes/footer.j2' %}\n\n\n \n\n \n\n \n\n \n \n\n).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'StrictUndefined' is not iterable"

{% include 'includes/node_info.j2' %}
{ "hosts": { "localhost": { "_ansible_no_log": false, "action": "template", "changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: No first item, sequence was empty." } },

kxr commented 4 years ago

The pattern suggests that there is discrepancy in hostnames. Can you identify which of the following values match or do not match with each other:

Also can you tell me if you running it on premise or in cloud (AWS, Azure, GCE etc).

kxr commented 4 years ago

@telnemri I have added logic to handle hostnames/inventory_name discrepancies. Can you try again with the latest version.

telnemri commented 4 years ago

@kxr thank you, the updated code fixes the issue and I'm able to generate the report now..

I do have a concern on the memory utilization metric though - this is a topic we've been investigating for some time now and it seems that when utilization counts cache and buffer as free memory, then our utilization is much lower than what we actually experience.

The correct metric practically to use is MemAvailable which recognizes that containers are using cgroups and somehow the shared/cached memory can not be freed as a regular process do

would you be able to add another column beside utilization to show both calculations (one counting cache and buffer as used memory, and the default one counting them as free memory)

kxr commented 4 years ago

@telnemri Thank you for the fix confirmation and the memory calculation suggestion. I will check and investigate it in more detail. Currently i am under the impression that the cached memory on the OpenShift nodes is free-able, but I could be wrong.

BTW, for the time being, you can easily switch to calculating used memory, counting the cache, by replacing ansible_memory_mb.nocache.free with ansible_memory_mb.real.free in report/includes/system_resources.j2