prometheus-community / ansible

Ansible Collection for Prometheus
https://prometheus-community.github.io/ansible/
Apache License 2.0
397 stars 135 forks source link

enhancement(node_exporter): add vrf var to node_exporter #339

Closed kuhball closed 7 months ago

kuhball commented 7 months ago

We're using node-exporter on nvidia switches running cumulus. Within these switches it's mandatory for us to use the node-exporter within a vrf provided via ifupdown2. This is not limited to our usecase but can also be used on a linux vm.

The systemd option ProtectControlGroups is needed for usage of vrf. Without setting the option this error is reported:

Apr 23 14:57:16 test-01 node_exporter[1817356]: mkdir failed for /sys/fs/cgroup/unified/system.slice/node_exporter.service/vrf: Read-only file system
Apr 23 14:57:16 test-01 node_exporter[1817356]: Failed to setup vrf cgroup2 directory

Not sure if this is something you want to support. If not please feel free to close this PR. If something is missing I'm happy to amend changes. I wasn't sure about your test requirements, if required I'm happy to extend the molecule tests.

github-actions[bot] commented 7 months ago

Docs Build 📝

This PR is closed and any previously published docsite has been unpublished.

SuperQ commented 7 months ago

This might be a bit too system specific for this role.

Maybe another way to handle this would be to add a systemd overlay like this:

In /etc/systemd/system/node_exporter.service.d/vrf.conf add this:

[Service]
ExecStart=...
ProtectControlGroups=false

See the "drop-in" pattern in the systemd docs.

kuhball commented 7 months ago

Thanks for the push into the direction of drop-ins, totally forgot about them! If anybody in the future needs this, this is a working example from a cumulus device:

[Service]
ExecStart=
ExecStart=ip vrf exec mgmt runuser -u node-exp -g node-exp -- /usr/local/bin/node_exporter \
    '--web.listen-address=127.0.0.1:9101' \
    '--web.telemetry-path=/metrics' \
    '--collector.ethtool'\
    '--collector.netdev.netlink'\
    '--collector.systemd'\
    '--collector.textfile'\
    '--collector.textfile.directory=/var/lib/node_exporter'
ProtectControlGroups=false

First empty ExecStart= is needed to overwrite.