Closed no-clu closed 2 years ago
I'm guessing, but this looks like the docker engine host has a newer OS than the container, and smartctl was compiled against a later libc version. You could try using smartctl from the container OS, for testing you can start the container with:
docker run telegraf /bin/sh -c 'apt-get update && apt-get install -y smartmontools && telegraf'
If that works, a more permanent solution is shown on this page under Install Additional Packages.
Thanks for the reply. I have tried installing smartmontolls on the container while running using EXEC but just got other issues whereby the disks on the host are then not found to be scanned.
To replicate
Started Docker container with Environmental Variables and mounts as per FAQ.
Using Portainer use the Execute utility, and run the following.
apt update
apt install smartmon tools
This all goes fine. But then if I run smartctl is cannot find the disks.
smartctl --scan
scan_smart_devices: glob(3) aborted matching pattern /dev/discs/disc*
Having looked up the error it seems to be that smartctl cannot find any /dev/sd*[a-z] If I try specifying the disk as they are mounted still no joy
smartctl /hostfs/dev/sda
smartctl 6.6 2016-05-31 r4324 [aarch64-linux-5.4.45-rockchip64] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org<
/hostfs/dev/sda: Unable to detect device type Please specify device type with the -d option.
Specifying the disk type gives yet another error
smartctl -d ata /hostfs/dev/sda
smartctl 6.6 2016-05-31 r4324 [aarch64-linux-5.4.45-rockchip64] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
Smartctl open device: /hostfs/dev/sda failed: Operation not permitted
Here is df from within the container
df -h
Filesystem Size Used Avail Use% Mounted on overlay 293G 3.1G 275G 2% / tmpfs 64M 0 64M 0% /dev tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup shm 64M 0 64M 0% /dev/shm /dev/sdb1 293G 3.1G 275G 2% /hostfs udev 1.9G 0 1.9G 0% /hostfs/dev tmpfs 1.9G 0 1.9G 0% /hostfs/dev/shm tmpfs 387M 21M 367M 6% /hostfs/run tmpfs 5.0M 4.0K 5.0M 1% /hostfs/run/lock tmpfs 1.9G 0 1.9G 0% /hostfs/sys/fs/cgroup tmpfs 1.9G 4.0K 1.9G 1% /hostfs/tmp /dev/sda1 3.6T 89M 3.6T 1% /hostfs/srv/dev-disk-by-label-storage1 /dev/mmcblk1p1 3.4G 60M 3.3G 2% /hostfs/media/mmcboot overlay 293G 3.1G 275G 2% /hostfs/var/lib/docker/overlay2/1476fdd0a24ce239a79aeeef55a0ca724262c5eec442fe7ef922e091e3a96c39/merged overlay 293G 3.1G 275G 2% /hostfs/var/lib/docker/overlay2/d2720f226447273a9d519c1067265add25f0b7b7a2d6665a4fbba448f657e64c/merged overlay 293G 3.1G 275G 2% /hostfs/var/lib/docker/overlay2/a4e1d0ef6f236150786ab92bd7c1b694fef3f098065855dfb5088c219f3435b5/merged tmpfs 1.9G 0 1.9G 0% /proc/asound tmpfs 1.9G 0 1.9G 0% /sys/firmware tmpfs 387M 0 387M 0% /hostfs/var/lib/docker/overlay2/fa49a208d66a0cd0e63cfb9a6baaa39a481fe6b9a659742e60e34b310664c3e0/merged/hostfs/run/user/1000
Despite being mounted /dev/sda1, and ls on /dev does show the disks
ls -l
total 0 lrwxrwxrwx 1 root root 13 Jun 26 10:16 fd -> /proc/self/fd crw-rw-rw- 1 root root 1, 7 Jun 26 10:16 full drwxrwxrwt 2 root root 40 Jun 26 10:16 mqueue crw-rw-rw- 1 root root 1, 3 Jun 26 10:16 null lrwxrwxrwx 1 root root 8 Jun 26 10:16 ptmx -> pts/ptmx drwxr-xr-x 2 root root 0 Jun 26 10:16 pts crw-rw-rw- 1 root root 1, 8 Jun 26 10:16 random drwxrwxrwt 2 root root 40 Jun 26 10:16 shm lrwxrwxrwx 1 root root 15 Jun 26 10:16 stderr -> /proc/self/fd/2 lrwxrwxrwx 1 root root 15 Jun 26 10:16 stdin -> /proc/self/fd/0 lrwxrwxrwx 1 root root 15 Jun 26 10:16 stdout -> /proc/self/fd/1 crw-rw-rw- 1 root root 5, 0 Jun 26 10:16 tty crw-rw-rw- 1 root root 1, 9 Jun 26 10:16 urandom crw-rw-rw- 1 root root 1, 5 Jun 26 10:16 zero
Thanks
Maybe it will help if you start the container with --privileged
?
Thanks @danielnelson
Okay, progress. Running with --privileged
now allows me to run smartctl -a /dev/sd[a-z]
. I had tried this before but I've tried so many options it must have not been with the right configuration as it hadn't worked.
Anyway, after starting the container with --privileged
I did apt update
and apt install smartmontools
from EXEC in the container. I can then run smartctl -a /dev/sd[a-z]
which shows me the information expected, good news. Stopping and then starting container (or restarting) sometimes retains smartctl (smartmontools) and sometimes is dones't, this confuses me somewhat.
However I still have problems. If I check the telegraf database USE telegraf
and then SELECT * FROM smart_device limit 25
the entries have only timestamp, device, exit status and host. Nothing else at all.
All entries in the database have exit status = 2, for which smartctl manual says this:
Bit 2: Some SMART or other ATA command to the disk failed, or there was a checksum error in a SMART data structure (see '-b' option above).
The log currently gives me no errors, anyone know how to show more info/debug?
I am looking into this further but I'm not sure I'll get this working by myself so any pointers in the meantime would be great. Either way I will post back, success or not.
edit: removed comment that a restart still allowed smartctl to run. It doesn't. I don't know what happend but I cannot replicate this behaviour.
edit 2: added behaviour back in, a stop and start or a restart sometimes seems to retain smartctl in the container. I cannot explain this!
A little trial and error. If i stop and restart the container, the install of smartmontools persists. If I re-deploy (using portainer) this deletes the container and creates a new one and as such I have to reinstall smartmontools.
Now I've managed to get the smart plugin working but I had to do a number of steps to get there which seems like a lot of configuring when I have the capability on the host to simply just run smartctl. Anyway these were the steps I needs to take.
--privileged
smartmontools
on the running containersudo
on the running container I'm not happy with the solution overall and will keep looking for an alternative solution. Moreover this telegraf container seems to crash after an hour or so. Happy to post the log but not sure how to best do that?
Here is the start of the log. As you can see I cannot tell when it crashed but I can see from the status in Portainer that it had been stopped for 9hours when I checked. Thus this was working for few hours only. After the SIGILL: illegal instruction
there are lots of messages re: github, go sources and plugins followed by the r codes? I've no idea about this. Any help appreciated, I'm going to disable smart plugin for now to see how the container behaves and to see if the crash linked to smart plugin.
2020-06-29T06:57:16Z I! Starting Telegraf 1.14.4 2020-06-29T06:57:16Z I! Using config file: /etc/telegraf/telegraf.conf 2020-06-29T06:57:16Z I! Loaded inputs: disk mem processes swap system smart temp cpu diskio kernel 2020-06-29T06:57:16Z I! Loaded aggregators: 2020-06-29T06:57:16Z I! Loaded processors: 2020-06-29T06:57:16Z I! Loaded outputs: influxdb 2020-06-29T06:57:16Z I! Tags enabled: host=566e379017cb 2020-06-29T06:57:16Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"566e379017cb", Flush Interval:10s SIGILL: illegal instruction PC=0x6db34 m=7 sigcode=1
r0 0x4000a0c0a0 0xffffffffffffffe0 r2 0x40003fdc00 r3 0x0 r4 0x0 [etc...]
Thanks for documenting your findings. This is with the linux/arm64/v8 container? The SIGILL: illegal instruction
error to me indicates either the telegraf binary you have is built for a different architecture, some other arm flavor, we have done something unsafe, a hardware issue or even a bug in the Go compiler. If you attach the full stack trace as a file it might have some clues and it might be worth trying the other arm containers or even the Telegraf 1.13.4 container (which was built with Go 1.13).
Can you also attach the output of cat /proc/cpuinfo
?
@danielnelson thanks for your reply. I've attcahed the output of cat /proc/cpuinfo
as a .txt file.
cpuinfo.txt
With regards to the container being linux/arm64/v8 I wish I could respond more confidently, all I can say is that I'm using OpenMediaVault with PluginExtras which enables a simple "click a button" install of Docker and Portainer. I then used CLI to get my containers up and running and then use Portainer to manage and check logs. From Portainer I can see that the image for telegraf is as follows. Docker and containers are all totally new to me 2 weeks ago, and linux I've some very minor experience to date so please forgive.
telegraf:latest@sha256:e0add6e572b009eb3fa8cd9947ebdf62ab3fed81f306113704bc9b9a0cec89df telegraf version 1.14.4
docker --version
Docker version 19.03.12, build 48a6621
Inspecting the container from Portainer container inspect.txt
Hope that helps. I'll have a look at other arm containers (time to hit the search engines).
For what it's worth, I was able to run this without errors today. All I had to do was overwrite the command:
on launch:
telegraf:
image: telegraf
privileged: true
command:
- /bin/bash
- -c
- |
apt update
apt install -y smartmontools
telegraf
Started successfully:
2020-10-31T20:22:05Z I! Starting Telegraf 1.16.1
2020-10-31T20:22:05Z I! Using config file: /etc/telegraf/telegraf.conf
2020-10-31T20:22:05Z I! Loaded inputs: cpu disk diskio docker filecount httpjson influxdb kernel mem net netstat processes smart system
2020-10-31T20:22:05Z I! Loaded aggregators:
2020-10-31T20:22:05Z I! Loaded processors:
2020-10-31T20:22:05Z I! Loaded outputs: influxdb
This is running on a Synology DS218+ NAS.
I tried that solution and did not work. Please, let me know is there is some other (and easier) way to use inputs.smart in docker, I would be very interested. Thanks
Hello
May I come back to this thread. I took me quite a while to get smartctl running inside docker-telegraf. In a nutshell: adding privileged: true
to my docker-compose-file-telegraf-service helped.
telegraf:
image: telegraf
container_name: telegraf
hostname: telegraf
networks:
- default
privileged: true #for smartctl
environment:
- TZ=${TZ}
- HOST_VAR=/hostfs/var
- HOST_PROC=/hostfs/proc
- HOST_SYS=/hostfs/sys
- HOST_MOUNT_PREFIX=/hostfs
- HOST_ETC=/hostfs/etc
- HOST_RUN=/hostfs/run
links:
- influxdb
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- $APPDATADIR/telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro
- /:/hostfs:ro
depends_on:
- influxdb
restart: always
In the container CLI I can query the drives with /hostfs/usr/sbin/smarctl -a /dev/sdX
. My problem is that influxdb only shows "exit_status" with the value "2". No other data are fed into influxdb.
My telegraf.conf
file looks like:
[[inputs.smart]]
path_smartctl = "/hostfs/usr/sbin/smartctl"
read_method = "sequential"
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!
I have been facing the same problem today and got the same errors as in some of the comments but it's now working.
The steps are :
Running the container
docker run -d \
--name=telegraf_exporter \
-p 8086:8086 \
-p 9273:9273 \
-v $root_dir/docker/telegraf-config/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /:/hostfs:ro \
-v /etc:/hostfs/etc:ro \
-v /proc:/hostfs/proc:ro \
-v /sys:/hostfs/sys:ro \
-v /var:/hostfs/var:ro \
-v /run:/hostfs/run:ro \
-v /:/hostfs:ro \
-e HOST_ETC=/hostfs/etc \
-e HOST_PROC=/hostfs/proc \
-e HOST_SYS=/hostfs/sys \
-e HOST_VAR=/hostfs/var \
-e HOST_RUN=/hostfs/run \
-e HOST_MOUNT_PREFIX=/hostfs \
--user telegraf:$(stat -c '%g' /var/run/docker.sock) \
--network=nas \
--privileged \
--restart unless-stopped \
telegraf:latest
Logging as root in the container console and install smartmontools & sudo
apt update
apt install sudo smartmontools
Logging as root in the container console and edit "/etc/sudoers" to allow user telegraf to use sudo without password.
Add this line to the file telegraf ALL=NOPASSWD:/usr/sbin/smartctl
NB: I installed and used nano to edit "/etc/sudoers"
Stop image & change the config of "telegraf.conf"
[[inputs.smart]]
use_sudo = true
nocheck = "standby"
devices = [ "hostfs/dev/sda -d ata", "hostfs/dev/sdb -d ata", "hostfs/dev/sdc -d ata", "hostfs/dev/sdd -d ata", "hostfs/dev/sde -d ata", "hostfs/dev/sdf -d ata"]
I kept "--privileged" from a previous test, not sure it's necessary but doesn't hurt. Docker is running on Lubuntu 22.04 LTS . Data is scraped by prometheus.
Here is how I got this to work:
Dockerfile
, we will be building a custom image:
FROM telegraf
RUN apt-get update && apt-get install -y sudo smartmontools nvme-cli
RUN echo 'Cmnd_Alias SMARTCTL = /usr/sbin/smartctl' >> /etc/sudoers && \ echo 'Cmnd_Alias NVME = /usr/sbin/nvme' >> /etc/sudoers && \ echo 'telegraf ALL=(ALL) NOPASSWD: SMARTCTL, NVME' >> /etc/sudoers && \ echo 'Defaults!SMARTCTL !logfile, !syslog, !pam_session' >> /etc/sudoers && \ echo 'Defaults!NVME !logfile, !syslog, !pam_session' >> /etc/sudoers
2. In `telegraf.conf`:
[[inputs.smart]]
use_sudo = true
attributes = true
`attributes = true` will enable Telegraf to read and store SMART attributes, which I think is useful, but not required for the setup to work as a whole.
3. Build and run, I prefer docker-compose:
version: '3.7' services: telegraf: container_name: telegraf build: context: /path/to/your/Dockerfile-dir restart: unless-stopped privileged: true volumes:
There's a container built for you already. https://github.com/golift/telegraf-docker Because of this: https://github.com/influxdata/influxdata-docker/issues/563
There seems to be an issue with using smart plugin when used inside a container. I've set the environmental parameters and mounts as per the FAQ - docs/FAQ.md
Running by just enabling the plugin in the config file I get a smartctl not found error. If I point to the volume/bind/mounts of hostfs i.e. "/hostfs/usr/sbin" where smartctl is located then I get a error regarding GLIBC version.
config:
The version on the host is 2.28, and the version in the container is 2.24. The minimum version required by smartctl is 2.27 according to the error output.
I've tried numerous mapping of volumes/binds/mounts with no luck.