prometheus-community / ansible

Ansible Collection for Prometheus
https://prometheus-community.github.io/ansible/
Apache License 2.0
383 stars 130 forks source link

node_exporter: Systemd's PreventHome causes error if filesystem is mounted under '/home' #13

Closed cudevmaxwell closed 1 year ago

cudevmaxwell commented 1 year ago

In the node_exporter.service.j2 template, ProtectHome is set to read-only instead of yes only if /home is a separate partition. However, it is possible that a filesystem might be mounted under /home instead of at /home. Because ProtectHome is set to yes in that case, node_exporter can't run statfs() on that filesystem.

(Copy of https://github.com/cloudalchemy/ansible-node-exporter/issues/271)

tjdavis3 commented 1 year ago

I'm still seeing this issue on CentOS 7.9.2009 systems. ansible_facts shows a mountpoint at /home, but the node_exporter.service file still gets created with ProtectHome=yes:

-sh-4.2$ cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
-sh-4.2$ df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                  32G     0   32G   0% /dev
tmpfs                     32G     0   32G   0% /dev/shm
tmpfs                     32G  3.1G   29G  10% /run
tmpfs                     32G     0   32G   0% /sys/fs/cgroup
/dev/mapper/centos-root   50G   11G   40G  21% /
/dev/sda2               1014M  188M  827M  19% /boot
/dev/sda1                200M   12M  189M   6% /boot/efi
/dev/mapper/centos-home  365G   80G  285G  22% /home
tmpfs                    6.3G     0  6.3G   0% /run/user/1001
tmpfs                    6.3G     0  6.3G   0% /run/user/0
tmpfs                    6.3G     0  6.3G   0% /run/user/1774400003
-sh-4.2$ cat /etc/systemd/system/node_exporter.service
#
# Ansible managed
#

[Unit]
Description=Prometheus Node Exporter
After=network-online.target

[Service]
Type=simple
User=node-exp
Group=node-exp
ExecStart=/usr/local/bin/node_exporter \
    '--collector.systemd' \
'--collector.textfile' \
    '--collector.textfile.directory=/var/lib/node_exporter' \
    '--web.listen-address=0.0.0.0:9100' \
    '--web.telemetry-path=/metrics'

SyslogIdentifier=node_exporter
Restart=always
RestartSec=1
StartLimitInterval=0

ProtectHome=yes
NoNewPrivileges=yes

ProtectSystem=full

[Install]
WantedBy=multi-user.target
cudevmaxwell commented 1 year ago

@tjdavis3 Could you provide the output of ansible-galaxy collection list on the machine you're running your Ansible deployment on, in the playbook directory? FYI: That command prints out paths that might contain usernames, (/home/my_user_here/.ansible/collections/ansible_collection), feel free to redact that if you want.

cudevmaxwell commented 1 year ago

@tjdavis3 Another thing to check: Could you run ansible <HOSTNAME_HERE> -m ansible.builtin.setup and pull out ansible_mounts variable? Feel free to redact from that as well, I'm just interested in the mount key/field in the objects in that list.

tjdavis3 commented 1 year ago

This is from my local machine. We also ran it from AWX which would have installed the latest version on run.

# /usr/local/Cellar/ansible/7.2.0/libexec/lib/python3.11/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    5.2.0
ansible.netcommon             4.1.0
ansible.posix                 1.5.1
ansible.utils                 2.9.0
ansible.windows               1.13.0
arista.eos                    6.0.0
awx.awx                       21.11.0
azure.azcollection            1.14.0
check_point.mgmt              4.0.0
chocolatey.chocolatey         1.4.0
cisco.aci                     2.3.0
cisco.asa                     4.0.0
cisco.dnac                    6.6.3
cisco.intersight              1.0.23
cisco.ios                     4.3.1
cisco.iosxr                   4.1.0
cisco.ise                     2.5.12
cisco.meraki                  2.15.0
cisco.mso                     2.2.1
cisco.nso                     1.0.3
cisco.nxos                    4.0.1
cisco.ucs                     1.8.0
cloud.common                  2.1.2
cloudscale_ch.cloud           2.2.4
community.aws                 5.2.0
community.azure               2.0.0
community.ciscosmb            1.0.5
community.crypto              2.10.0
community.digitalocean        1.23.0
community.dns                 2.5.0
community.docker              3.4.0
community.fortios             1.0.0
community.general             6.3.0
community.google              1.0.0
community.grafana             1.5.3
community.hashi_vault         4.1.0
community.hrobot              1.7.0
community.libvirt             1.2.0
community.mongodb             1.4.2
community.mysql               3.5.1
community.network             5.0.0
community.okd                 2.2.0
community.postgresql          2.3.2
community.proxysql            1.5.1
community.rabbitmq            1.2.3
community.routeros            2.7.0
community.sap                 1.0.0
community.sap_libs            1.4.0
community.skydive             1.0.0
community.sops                1.6.0
community.vmware              3.3.0
community.windows             1.12.0
community.zabbix              1.9.1
containers.podman             1.10.1
cyberark.conjur               1.2.0
cyberark.pas                  1.0.17
dellemc.enterprise_sonic      2.0.0
dellemc.openmanage            6.3.0
dellemc.os10                  1.1.1
dellemc.os6                   1.0.7
dellemc.os9                   1.0.4
dellemc.powerflex             1.5.0
dellemc.unity                 1.5.0
f5networks.f5_modules         1.22.0
fortinet.fortimanager         2.1.7
fortinet.fortios              2.2.2
frr.frr                       2.0.0
gluster.gluster               1.0.2
google.cloud                  1.1.2
grafana.grafana               1.1.0
hetzner.hcloud                1.9.1
hpe.nimble                    1.1.4
ibm.qradar                    2.1.0
ibm.spectrum_virtualize       1.11.0
infinidat.infinibox           1.3.12
infoblox.nios_modules         1.4.1
inspur.ispim                  1.2.0
inspur.sm                     2.3.0
junipernetworks.junos         4.1.0
kubernetes.core               2.3.2
lowlydba.sqlserver            1.3.1
mellanox.onyx                 1.0.0
netapp.aws                    21.7.0
netapp.azure                  21.10.0
netapp.cloudmanager           21.22.0
netapp.elementsw              21.7.0
netapp.ontap                  22.2.0
netapp.storagegrid            21.11.1
netapp.um_info                21.8.0
netapp_eseries.santricity     1.4.0
netbox.netbox                 3.10.0
ngine_io.cloudstack           2.3.0
ngine_io.exoscale             1.0.0
ngine_io.vultr                1.1.3
openstack.cloud               1.10.0
openvswitch.openvswitch       2.1.0
ovirt.ovirt                   2.4.1
purestorage.flasharray        1.16.2
purestorage.flashblade        1.10.0
purestorage.fusion            1.3.0
sensu.sensu_go                1.13.2
splunk.es                     2.1.0
t_systems_mms.icinga_director 1.32.0
theforeman.foreman            3.8.0
vmware.vmware_rest            2.2.0
vultr.cloud                   1.7.0
vyos.vyos                     4.0.0
wti.remote                    1.0.4

# .../.ansible/collections/ansible_collections
Collection             Version
---------------------- -------
ansible.netcommon      5.0.0
ansible.posix          1.1.1
ansible.utils          2.9.0
ansible.windows        1.13.0
azure.azcollection     1.15.0
check_point.mgmt       2.0.0
cloud.common           2.1.3
community.crypto       2.11.1
community.digitalocean 1.0.0
community.general      6.4.0
community.grafana      1.5.4
community.kubernetes   1.1.1
community.mysql        3.6.0
community.network      1.2.0
community.postgresql   2.3.2
community.vmware       3.5.0
community.zabbix       1.9.2
f5networks.f5_modules  1.23.0
fortinet.fortios       1.0.15
geerlingguy.mac        2.1.1
gluster.gluster        1.0.1
google.cloud           1.0.1
netbox.netbox          3.10.0
prometheus.prometheus  0.3.1
robertdebock.roles     1.10.6
vmware.vmware_rest     2.3.1
vyos.vyos              1.0.5

Here's the facts, filtered to "mount" (ansible <HOSTNAME> -m ansible.builtin.setup | grep mount):

        "ansible_form_factor": "Rack Mount Chassis",
        "ansible_mounts": [
                "mount": "/boot",
                "mount": "/boot/efi",
                "options": "rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro",
                "mount": "/",
                "mount": "/home",
cudevmaxwell commented 1 year ago

@tjdavis3 Looks good so far. 0.3.1 should have included this fix, mounts look good. Could you provide the output of cat .../.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/templates/node_exporter.service.j2? Not literally ..., obviously, whatever the real value is. Just double checking the jinja2 template used to build the service unit.

tjdavis3 commented 1 year ago
$ cat ~/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/templates/node_exporter.service.j2
{{ ansible_managed | comment }}

[Unit]
Description=Prometheus Node Exporter
After=network-online.target

[Service]
Type=simple
User={{ node_exporter_system_user }}
Group={{ node_exporter_system_group }}
ExecStart={{ node_exporter_binary_install_dir }}/node_exporter \
{% for collector in node_exporter_enabled_collectors -%}
{%   if not collector is mapping %}
    '--collector.{{ collector }}' \
{%   else -%}
{%     set name, options = (collector.items()|list)[0] -%}
    '--collector.{{ name }}' \
{%     for k,v in options|dictsort %}
    '--collector.{{ name }}.{{ k }}={{ v }}' \
{%     endfor -%}
{%   endif -%}
{% endfor -%}
{% for collector in node_exporter_disabled_collectors %}
    '--no-collector.{{ collector }}' \
{% endfor %}
{% if node_exporter_tls_server_config | length > 0 or node_exporter_http_server_config | length > 0 or node_exporter_basic_auth_users | length > 0 %}
    {% if node_exporter_version is version('1.5.0', '>=') %}
    '--web.config.file=/etc/node_exporter/config.yaml' \
    {% else %}
    '--web.config=/etc/node_exporter/config.yaml' \
    {% endif %}
{% endif %}
    '--web.listen-address={{ node_exporter_web_listen_address }}' \
    '--web.telemetry-path={{ node_exporter_web_telemetry_path }}'

SyslogIdentifier=node_exporter
Restart=always
RestartSec=1
StartLimitInterval=0

{% set protect_home = 'yes' %}
{% for m in ansible_mounts if m.mount.startswith('/home') %}
{%   set protect_home = 'read-only' %}
{% endfor %}
ProtectHome={{ protect_home }}
NoNewPrivileges=yes

{% if (ansible_facts.packages.systemd | first).version is version('232', '>=') %}
ProtectSystem=strict
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=yes
{% else %}
ProtectSystem=full
{% endif %}

[Install]
WantedBy=multi-user.target
cudevmaxwell commented 1 year ago

@tjdavis3 Found the bug, my fault.

The template uses this jinja2 snippet:

{% set protect_home = 'yes' %}
{% for m in ansible_mounts if m.mount.startswith('/home') %}
    {%   set protect_home = 'read-only' %}
{% endfor %}
ProtectHome={{ protect_home }}

This is wrong. The inner set doesn't impact the protect_home variable in the outer scope.

Working on a fix now.

tjdavis3 commented 1 year ago

Here's what I did as a workaround for now:

  post_tasks:
  - name: "Make sure ProtectHome is set to read-only"
    become: yes
    ansible.builtin.lineinfile:
      search_string: 'ProtectHome=yes'
      line: 'ProtectHome=read-only'
      path: /etc/systemd/system/node_exporter.service
      state: present
    notify:
      - restart node_exporter
cudevmaxwell commented 1 year ago

Tracking in #95. Sorry @tjdavis3!

cudevmaxwell commented 1 year ago

@tjdavis3 this was fixed in #94, and the fix will hopefully be included in the next release. Thanks again for reporting!