Closed danepowell closed 10 months ago
Hi,
Can you provide the output of lsblk
and df -h
to show us what that drive really should show?
Thanks
Sure, thanks. I'll remove irrelevant volumes (loop / snap, etc)
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 3.7T 0 disk
└─sda1 8:1 0 3.7T 0 part /home/dane/media
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 512M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1G 0 part /boot
└─nvme0n1p3 259:3 0 475.4G 0 part
└─ubuntu--vg-ubuntu--lv 253:0 0 475.4G 0 lvm /
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 467G 54G 390G 13% /
/dev/nvme0n1p2 976M 208M 702M 23% /boot
/dev/nvme0n1p1 511M 5.3M 506M 2% /boot/efi
/dev/sda1 3.6T 2.1T 1.4T 61% /home/dane/media
Great thanks!
If you run Telegraf directly on your host, do you get the same result showing a full /
, /boot
, and /boot/efi
or is it only full in the container, or both?
What command are you using to launch the container? When you run the container, are you mounting any of the host filesystems into the container beyond what is there by default?
Thanks again
Running from the host, usage is calculated correctly. It looks like it also parses the root device name slightly differently (dm-0 instead of mapper/ubuntu--vg-ubuntu--lv.
$ TELEGRAF_CONFIG_PATH=./telegraf.conf ./usr/bin/telegraf --input-filter=disk --test --debug
2022-03-30T14:49:45Z I! Using config file: ./telegraf.conf
2022-03-30T14:49:45Z I! Starting Telegraf 1.22.0
2022-03-30T14:49:45Z I! Loaded inputs: disk
2022-03-30T14:49:45Z I! Loaded aggregators:
2022-03-30T14:49:45Z I! Loaded processors:
2022-03-30T14:49:45Z W! Outputs are not used in testing mode!
2022-03-30T14:49:45Z I! Tags enabled: host=daenerys
2022-03-30T14:49:45Z D! [agent] Initializing plugins
2022-03-30T14:49:45Z D! [agent] Starting service inputs
2022-03-30T14:49:45Z D! [agent] Stopping service inputs
2022-03-30T14:49:45Z D! [agent] Input channel closed
2022-03-30T14:49:45Z D! [agent] Stopped Successfully
> disk,device=dm-0,fstype=ext4,host=daenerys,mode=rw,path=/ free=419502268416i,inodes_free=30477627i,inodes_total=31162368i,inodes_used=684741i,total=501405966336i,used=56362274816i,used_percent=11.844184572608826 1648651785000000000
> disk,device=nvme0n1p2,fstype=ext4,host=daenerys,mode=rw,path=/boot free=735690752i,inodes_free=65228i,inodes_total=65536i,inodes_used=308i,total=1023303680i,used=217149440i,used_percent=22.789701969246906 1648651785000000000
> disk,device=nvme0n1p1,fstype=vfat,host=daenerys,mode=rw,path=/boot/efi free=530321408i,inodes_free=0i,inodes_total=0i,inodes_used=0i,total=535805952i,used=5484544i,used_percent=1.0236063969666391 1648651785000000000
> disk,device=sda1,fstype=ext4,host=daenerys,mode=rw,path=/home/dane/media free=1477654212608i,inodes_free=244183638i,inodes_total=244187136i,inodes_used=3498i,total=3936862621696i,used=2259154907136i,used_percent=60.45679173708422 1648651785000000000
I'm running everything via docker-compose up -d
, so what you see in the Docker compose file in the OP is exactly what's being mounted (just the root FS, /
).
Inspecting /proc/mounts on the container vs host, it's way more verbose inside the container. There's all sorts of extra mounts I wouldn't expect to see if Docker was just dumbly mounting into /hostfs
:
$ docker exec -it telegraf cat /hostfs/proc/mounts
<snip, what is this junk?>
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/lib/modules ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/lib/firmware ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/usr/src ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/var/log ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/media ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/mnt ext4 rw,relatime 0 0
Running from the host, usage is calculated correctly.
Great, that is good to know that it works outside the container.
It looks like it also parses the root device name slightly differently (dm-0 instead of mapper/ubuntu--vg-ubuntu--lv.
This is odd. It looks like gopsutil reads this from /sys
:
$ cat /sys/block/dm-0/dm/name
ubuntu--vg-ubuntu--lv
I'm running everything via docker-compose up -d, so something is going on when reviewing the disks in the container.
Thanks for confirming that. I modified it slightly (i.e. networks, restart, and env_file) and ran it with my Ubuntu VM that is also using LVM and got the results that I think I expected:
telegraf | 2022-03-30T16:37:46Z I! Starting Telegraf 1.22.0
<snip>
telegraf | disk,device=mapper/ubuntu--vg-ubuntu--lv,fstype=ext4,host=52fdbb17d4cf,mode=rw,path=/ used_percent=51.94518664157201,inodes_total=770048i,inodes_free=676203i,inodes_used=93845i,total=12347138048i,free=5622292480i,used=6077456384i 1648658270000000000
telegraf | disk,device=vda2,fstype=ext4,host=52fdbb17d4cf,mode=rw,path=/boot total=1551745024i,free=1339252736i,used=115187712i,used_percent=7.91972694092732,inodes_total=98304i,inodes_free=97992i,inodes_used=312i 1648658270000000000
Which does appear to align with the output from df -h
:
/dev/mapper/ubuntu--vg-ubuntu--lv 12G 5.9G 5.1G 54% /
/dev/vda2 1.5G 110M 1.3G 8% /boot
Inspecting /proc/mounts on the container vs host, it's way more verbose inside the container. There's all sorts of extra mounts I wouldn't expect to see if Docker was just dumbly mounting into /hostfs:
gopsutil will attempt to read from /proc/1/mountinfo
and fallback to /proc/self/mountinfo
. Here is mine for example:
# cat /proc/1/mounts | grep mapper
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs ext4 ro,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /etc/resolv.conf ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /etc/hostname ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /etc/hosts ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /etc/telegraf/telegraf.conf ext4 rw,relatime 0 0
This makes a bit more sense than seeing /var/log
and /usr/src
. Is one of your environment variables mounting other things in your docker container? Any other odd mounts?
I am at a bit of a loss, but my next thing would be try with just docker, away from the compose in case some additional variable is messing with things, and run:
docker run -d --name=telegraf \
-v $PWD/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
-v /:/hostfs:ro \
-e HOST_ETC=/hostfs/etc \
-e HOST_PROC=/hostfs/proc \
-e HOST_SYS=/hostfs/sys \
-e HOST_VAR=/hostfs/var \
-e HOST_RUN=/hostfs/run \
-e HOST_MOUNT_PREFIX=/hostfs \
telegraf
docker run -d --name=telegraf2 \
-v $PWD/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
-v /:/hostfs:ro \
-e HOST_ETC=/hostfs/etc \
-e HOST_PROC=/hostfs/proc \
-e HOST_SYS=/hostfs/sys \
-e HOST_VAR=/hostfs/var \
-e HOST_RUN=/hostfs/run \
-e HOST_MOUNT_PREFIX=/hostfs \
telegraf
Leads to the exact same results as using Docker Composer. Weird, right?
And then there's this:
$ docker exec -it telegraf2 cat /proc/1/mounts | grep mapper
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/etc ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/etc/#fstab# ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/etc/.pwd.lock ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/etc/NetworkManager ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/etc/PackageKit ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/etc/UPower ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/etc/X11 ext4 rw,relatime 0 0
/dev/mapper/ubuntu--vg-ubuntu--lv /hostfs/etc/adduser.conf ext4 rw,relatime 0 0
<snip>
Am I reading this right that it's essentially mounting every single inode individually? As someone not overly familiar with the guts of Docker volume mounts, that seems wrong.
I'm running Docker 20.10.12 btw, since it seems we're getting into "it works on my machine, why not yours?" territory 😄
Leads to the exact same results as using Docker Composer. Weird, right?
Interesting, I'm a little happy that it reproduced the issue as it means it is not some weird compose/services file setting causing this and instead possible some setting or configuration with Docker.
Am I reading this right that it's essentially mounting every single inode individually?
Wow, that does seem quite unexpected. Can you gather docker info
?
$ docker info
Client:
Context: default
Debug Mode: false
Server:
Containers: 22
Running: 21
Paused: 0
Stopped: 1
Images: 34
Server Version: 20.10.12
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
Default Runtime: runc
Init Binary: docker-init
containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc version:
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-105-generic
Operating System: Ubuntu Core 18
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.35GiB
Name: daenerys
ID: FUGJ:HZRT:PLGS:SGWD:XRKN:EVPE:U3ZH:XGQS:CLJM:W5HH:WGTC:W4LX
Docker Root Dir: /var/snap/docker/common/var-lib-docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Operating System: Ubuntu Core 18
Aaaaaaah you are running docker as a snap. Once I switch to that I can reproduce this! Let me look into this further now. Just FYI I am out for the next week, so it might take a bit of time to get back to you.
Thanks for the back and forth, hopefully, we can figure out a solution from here.
Telegraf as a snap
Docker as a snap, yep. This is not the first time that snaps have caused me trouble, maybe using it for Docker was a bad idea 🤕
Telegraf as a snap
Docker as a snap, yep. This is not the first time that snaps have caused me trouble, maybe using it for Docker was a bad idea face_with_head_bandage
Ah yes, I meant docker :)
Thanks for the help triaging this. I've since uninstalled the Docker Snap and switched to the Apt version and this is no longer an issue.
@powersj can you help me understand why docker snap causes an issue ?
@danepowell sorry for the delay, I do want to get back and figure out what is going on here but have been in and out. I am glad to hear the non-snap route works for you.
@aratik711 I have not been able to dig into this yet, but my hunch was that since the docker snap is limited to files under $HOME, some permissions issue or other issue causes problems with bind mounts. Which results in all sorts of entries in /proc/1/mounts
.
Hi,
Looking back and older issues. This was an issue with Docker when running as a snap. The disk input plugin will collect data from /proc/self/mounts
or the mounted version of that. It appears that when using the snap additional items are also included in that file. As a result, the disk input pluign will also report on everything found in there.
User's who come across this could use ignore_fs
to remove certain filesystem items or tagpass to only allow specific device/path/label/etc. from pass through.
I am going to close this as ultimately this is ultimately not an issue with Telegraf, but an affect of using the docker as a snap.
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.22.0 Docker 20.10.12 Ubuntu 20.04.4
Docker
Steps to reproduce
I have Telegraf running via Docker on an Ubuntu host. The host is running on an NVMe drive mounted at
/
and also has a SATA drive mounted at/home/dane/media
.Run
telegraf --input-filter=disk --test --debug
in the container.Expected behavior
See correct disk usage for both disks.
Actual behavior
Usage on the SATA drive is reported correctly, but the root drive is incorrectly shown as 100% full.
Additional info
No response