Closed badnetmask closed 1 year ago
Hi,
Going to need some additional information.
Can you jump into the container, call telegraf directly please, and restart docker? I would like for you to collect the full stack trace if possible. After hitting the panic you should see a long list of additional logs that point to a line of code that is causing the nil pointer deference.
Restart docker
When you say restart, what are you doing? sudo systemctl restart docker
?
restart: unless-stopped
Does telegraf start successfully afterwards?
Thanks
When I say "restart" I mean "docker-compose restart telegraf" to pickup the new config. After that, docker restarts the container because of the "unless-stopped", but it keeps looping until I disable the plugin and restart the container again.
Manually running in debug mode from inside the container doesn't really give me anything that wasn't already in the logs (note: I chmod 000 the logs on purpose so it won't write there, and the influxdb creation forbidden is not an issue because the db already exists).
$ docker exec -u telegraf -it telegraf /bin/bash
telegraf@f808db8d4a81:/$ telegraf --debug
2023-09-25T13:51:27Z I! Loading config: /etc/telegraf/telegraf.conf
2023-09-25T13:51:27Z E! Unable to open /var/log/telegraf/telegraf.log (open /var/log/telegraf/telegraf.log: permission denied), using stderr
2023-09-25T13:51:27Z I! Starting Telegraf 1.27.4
2023-09-25T13:51:27Z I! Available plugins: 237 inputs, 9 aggregators, 28 processors, 23 parsers, 59 outputs, 4 secret-stores
2023-09-25T13:51:27Z I! Loaded inputs: cpu disk diskio exec (6x) file kernel mem net netstat nfsclient processes swap system temp
2023-09-25T13:51:27Z I! Loaded aggregators:
2023-09-25T13:51:27Z I! Loaded processors:
2023-09-25T13:51:27Z I! Loaded secretstores:
2023-09-25T13:51:27Z I! Loaded outputs: influxdb
2023-09-25T13:51:27Z I! Tags enabled: host=localhost
2023-09-25T13:51:27Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"localhost", Flush Interval:10s
2023-09-25T13:51:27Z D! [agent] Initializing plugins
2023-09-25T13:51:27Z D! [inputs.nfsclient] using [/proc/self/mountstats] for mountstats
2023-09-25T13:51:27Z D! [inputs.nfsclient] Including all mounts.
2023-09-25T13:51:27Z D! [inputs.nfsclient] Not excluding any mounts.
2023-09-25T13:51:27Z D! [inputs.nfsclient] Including all operations.
2023-09-25T13:51:27Z D! [inputs.nfsclient] Not excluding any operations.
2023-09-25T13:51:27Z W! DeprecationWarning: Value "false" for option "ignore_protocol_stats" of plugin "inputs.net" deprecated since version 1.27.3 and will be removed in 1.36.0: use the 'inputs.nstat' plugin instead
2023-09-25T13:51:27Z D! [agent] Connecting outputs
2023-09-25T13:51:27Z D! [agent] Attempting connection to [outputs.influxdb]
2023-09-25T13:51:27Z W! [outputs.influxdb] When writing to [https://<ip>:8086]: database "telegraf" creation failed: 403 Forbidden
2023-09-25T13:51:27Z D! [agent] Successfully connected to outputs.influxdb
2023-09-25T13:51:27Z D! [agent] Starting service inputs
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x26e65a0]
Two issues:
2023-09-25T13:51:27Z D! [inputs.nfsclient] using [/proc/self/mountstats] for mountstats
The first issue is that file most likely does not exist. As you mounted proc to /host/proc
, the file you are trying to monitor will be: /host/proc/self/mountstats
. That can be set by adding another environment variable:
MOUNT_PROC: /host/proc/self/mountstats
The next issue is that will probably get mounted with only read permissions for root:
❯ docker exec telegraf ls -la /host/proc/self/mountstats
-r-------- 1 root root 0 Sep 25 14:47 /host/proc/self/mountstats
The default user for our docker images is telegraf, not root. I would have expected an error not a panic as well.
My new PR will show the following for the wrong permissions:
telegraf | 2023-09-25T15:08:00Z E! [inputs.nfsclient] Error in plugin: open /host/proc/self/mountstats: permission denied
and the following for a file that does not exist:
telegraf | 2023-09-25T15:08:50Z E! [inputs.nfsclient] Error in plugin: stat /host/proc/self/mountstat: no such file or directory
I have also added info about MOUNT_PROC to the readme.
Okay, adding MOUNT_PROC makes it look at the correct file, but how do you suggest I fix the permissions problem then, since it's set by the OS automatically?
how do you suggest I fix the permissions problem then, since it's set by the OS automatically?
Since you are mounting the files read-only you cannot change the permissions on them, so you are left with running telegraf as root.
edit: I know that is far from idea and should be discouraged wherever possible. This really is a question about unix permissions at this point involving how docker mounts things into the container and the user the telegraf service runs as.
I understand what you're saying about the permissions, and it makes sense, but there is something else here, since I can cat the file as the telegraf user, and it can see the NFS mount.
$ docker exec -u telegraf telegraf env | grep MOUNT
MOUNT_PROC=/host/proc/self/mountstats
$ docker exec -u telegraf telegraf cat /host/proc/self/mountstats
(...)
device /dev/root mounted on /etc/group with fstype ext4
device 192.168.x.x:/volume/drive/path/telegraf mounted on /etc/telegraf with fstype nfs4 statvers=1.1
opts: rw,sync,vers=4.0,rsize=131072,wsize=131072,namlen=143,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.x.x,local_lock=none
(...)
So, what could be missing here?
I think some docker foo is tripping you up now:
❯ docker exec telegraf ls -la /host/proc/self/mountstats
-r-------- 1 root root 0 Sep 25 17:16 /host/proc/self/mountstats
❯ docker exec -u telegraf telegraf ls -la /host/proc/self/mountstats
-r-------- 1 telegraf telegraf 0 Sep 25 17:16 /host/proc/self/mountstats
As far as the telegraf process is concerned that file is owned by root:root.
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.27.4, Raspberry Pi OS 64-bit (Debian 11.7), docker-1.5-2, docker-compose-1.25.0-1
Docker
Steps to reproduce
Expected behavior
Data would be collected.
Actual behavior
Container crashes and restarts.
Additional info