influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.68k stars 5.59k forks source link

rhel8-rpm: Broken permission #14019

Closed genofire closed 1 year ago

genofire commented 1 year ago

Relevant telegraf.conf

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  hostname = ""
  logfile = "/var/log/telegraf/telegraf.log"
  logfile_rotation_interval = "24h"
  logfile_rotation_max_size = "100MB"
  logfile_rotation_max_archives = 2
  quiet = true
  omit_hostname = false

[[outputs.prometheus_client]]
  listen = "0.0.0.0:9270"
  path = "/metrics/"
  metric_version = 2           # sonst Warnung im Logfile

# Read metrics about cpu usage
[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## If true, collect raw CPU time metrics.
  collect_cpu_time = false
  ## If true, compute and report the sum of all non-idle CPU states.
  report_active = false

# Read metrics about disk usage by mount point
[[inputs.disk]]
  ## By default stats will be gathered for all mount points.
  ## Set mount_points will restrict the stats to only the specified mount points.
  # mount_points = ["/"]
  ## Ignore mount points by filesystem type.
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

Logs from Telegraf

does not matter

System info

rhel8

Docker

No response

Steps to reproduce

  1. Install
  2. Start

Expected behavior

  1. Install
    • /var/log/telegraf created with permission of user:group : telegraf:telegraf
    • telegraf.service use telegraf user
  2. Start
  3. running

Actual behavior

  1. Install
    • /var/log/telegraf created with permission of user root:root
    • telegraf.service use telegraf user
  2. start
  3. failed, because it has no permission to write

Additional info

# rpm -qp --dump telegraf-1.28.1-1.x86_64.rpm 
/etc/logrotate.d/telegraf 131 1694532127 9ab14f18106cb6f1bd55733eea61d7ab5242691defa17aec275bb2a26fe26955 0100644 root root 1 0 0 X
/etc/telegraf/telegraf.conf 497518 1694532127 c10f04d852e349a679290f4987cad0a1ad8e764f1f3b26adb65af23506f10c3a 0100644 root root 1 0 0 X
/etc/telegraf/telegraf.d/.ignore 97 1694532127 1c72cadef1352129f026d6bdb749c9b91e1dec97056339a1d553214aacc0c70a 0100644 root root 1 0 0 X
/usr/bin/telegraf 197279744 1694532127 ac9e1ce28391e2369b767d414a96f201d8407b6b38b40f66c9e0b03d058be318 0100755 root root 0 0 0 X
/usr/lib/telegraf/scripts/init.sh 5803 1694532127 46098761230fb7850ae97b4d1134a782d80633bd14eaf94582539338738a7345 0100755 root root 0 0 0 X
/usr/lib/telegraf/scripts/telegraf.service 509 1694532127 a941ff3e6d63bcf72340d85ea7ac104dd41f3d4a100eb0e0ed0739add660775d 0100644 root root 0 0 0 X
/var/log/telegraf 0 1694532127 0000000000000000000000000000000000000000000000000000000000000000 040755 root root 0 0 0 X
powersj commented 1 year ago

failed, because it has no permission to write

Can you provide an example config demonstrating that the service fails to start because it does not have access to the log directory?

I don't disagree that this is probably at least the wrong group, but before we go changing things here I want to understand the specific scenario.

genofire commented 1 year ago

Sadly i have to cleanup the logs already, because of the crashloop filled the volumen.

here the system-unit (equal to the sha256-hash in the package with a941ff3e6d63bcf72340d85ea7ac104dd41f3d4a100eb0e0ed0739add660775d):

[Unit]
Description=Telegraf
Documentation=https://github.com/influxdata/telegraf
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/telegraf
User=telegraf
ExecStart=/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartForceExitStatus=SIGPIPE
KillMode=mixed
TimeoutStopSec=5
LimitMEMLOCK=8M:8M

[Install]
WantedBy=multi-user.target

During the setup we set/fix the permission manual but that changes back to root:root during one of the last updates. We believe that happens with the new/longer hashvalues 0000000000000000000000000000000000000000000000000000000000000000.

powersj commented 1 year ago

You provided systemd service file, do you have a telegraf config itself that fails that you could share? My concern is that you said the service "failed, because it has no permission to write". That is not what I see, as we normally will fall back to stderr if we can't write to the file, see here.

I did notice in the recent change we used to always chown the log directory to telegraf:telegraf, now we only chown it if it does not exist, but the dir will always exist. That change was wrong, so I've put up #14019 to revert that.

genofire commented 1 year ago

i believe that has something to do with the rotation, if it works on startup, chown to root and then rotate happens.

i provide logs in the next days when it happens on other systems.

PS: Config i have put in the description already.

genofire commented 1 year ago

One question, to your #14019 - has it also an effect to rhel? not only centos?

On the release: https://github.com/influxdata/telegraf/releases are only centos rpm's

but the repository of rhel8 has other hash-sums for there packages: https://repos.influxdata.com/rhel/8/x86_64/stable/

powersj commented 1 year ago

There is only a single RPM generated and used for all releases. Telegraf is a static Go binary, so no specific OS dependencies. The differences in sums is due to the package in the repo getting a signature, which you can see with rpm -qpi <package.rpm>:

Signature   : RSA/SHA512, Tue 12 Sep 2023 09:49:23 AM MDT, Key ID d8ff8e1f7df8b07e

If you extract both RPMs you should find the sha256sum on the binary itself the same between packages.

I will also suggest you start using the stable repo, instead of a specific release of RHEL:

https://repos.influxdata.com/stable/

Instructions are at the bottom of the page for using that.

genofire commented 1 year ago

that is strange, the file https://repos.influxdata.com/rhel/8/x86_64/stable/ (and https://repos.influxdata.com/stable/) has a different hash:

ce4283ba936006fcc2c0f029489251075581c56d2bb255282b3af196208649a0  telegraf-1.28.1-1.x86_64.rpm.repo

then the github release:

a47f0ea231db800da5a076f9e95bd35d50838a5ace8192a5209fa15e51c3aec3  telegraf-1.28.1-1.x86_64.rpm.github
powersj commented 1 year ago

I explained that in my previous message ;)

The differences in sums is due to the package in the repo getting a signature, which you can see with rpm -qpi

genofire commented 1 year ago

just for you information

here on another old machine (with Telegraf 1.28.1 (git: HEAD@3ea9ffbe))

the logs:

raf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.logOct 11 00:05:08 800-lphoedk8slbi001 telegraf[11530]: -10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/Oct 11 00:05:08 800-lphoedk8slbi001 telegraf[11530]: /log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/telegraf.log /var/log/telegraf/telegraf.2023-10-11-1696975508.log: permission deniedunable to rotate the file "/var/log/telegraf/telegraf.log", rename /var/log/telegraf/tele

yes, there is no linebreak in the log ;(