victronenergy / venus

Victron Energy Unix/Linux OS
https://github.com/victronenergy/venus/wiki
545 stars 70 forks source link

Add data and root partition full warning / notification. #1234

Open mpvader opened 4 months ago

mpvader commented 4 months ago

Relates mostly to people using large image. The full data partition is most valuable, since that affects our main use case (node-red); and full root partition is for all the tinkerers and people with mods.

(note that I'm only 99% sure that a full root partition is not normal - Jeroen will know for sure).

some data from the system that got me here:

root@einstein:~# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/root               972392    927788         0 100% /
devtmpfs                465376         4    465372   0% /dev
tmpfs                   515040      1016    514024   0% /run
tmpfs                   515040       576    514464   0% /var/volatile
/dev/mmcblk1p5         1134336   1059272         0 100% /data
/dev/mmcblk0p1        15538856      1280  15537576   0% /run/media/mmcblk0p1
tmpfs                   515040      1016    514024   0% /service
overlay                 515040       576    514464   0% /var/lib
root@einstein:~# cd /data
root@einstein:/data# ls
123SmartBMS-Venus     VeCanSetup            etc                   log                   reinstallScriptsList  var
GuiMods               conf                  home                  lost+found            setupOptions          venus
SetupHelper           db                    keys                  rcS.local             themes                vrmfilescache
root@einstein:/data# du -d 3 /data | sort -nr
1055788 /data
1039424 /data/conf
1039320 /data/conf/signalk
71048   /data/conf/signalk/node_modules
64224   /data/conf/signalk/.npm
7188    /data/conf/signalk/.cache
7032    /data/GuiMods
6348    /data/log
6208    /data/GuiMods/FileSets
1700    /data/GuiMods/FileSets/v3.20~36
kwindrem commented 4 months ago

Adding a warning is a great idea. Root being full is probably not a catastrophic since very little is written to it (root is read-only in a stock system). But it looks like /data can fill up which could cause serious/catastrophic issues with dbus Settings, logging and VRM caching. I don't know if it's appropriate, but would a garbage collector on /data is necessary? It looks like signals is by far the largest consumer of /data space so maybe there's something that can be done in it's server code.

FYI, SetupHelper checks for a full root partition and prevents package installs if less than 3 MB is free. The threshold was based mainly on the CCGX as it has limited space on /root.

I could also add checks for the /data partition but didn't see my packages taking filling up /data.

These checks would not help once a package is installed however.

Happy to help where I can.

mpvader commented 4 months ago

Hi Kevin, thanks.

Root being full is not normal and indication of someone having made a mistake. Either us, someone modding a GX; with or without setuphelper/guimods.

pkkrusty commented 2 months ago

My un-modded Cerbo with Large OS was showing rootfs partition 100% full a few weeks after updating to 3.22. Using the script suggested at https://www.victronenergy.com/live/venus-os:large#disk_space_issues_data_partition_full got me a few hundred megabytes, and I assume things will be good. Sitting at 85% now.

SignalK is a major user of space, but only a very small percentage of users need/use it. Would it be possible to create an uninstall script for SignalK?

mpvader commented 2 months ago

Hi @pkkrusty , have a look at chapter 8.2 on this page https://www.victronenergy.com/live/venus-os:large#disk_space_issues_data_partition_full.

That deletes all signalk created files from the data partition. To also remove the signalk installation itself, you have to install a normal image again, instead of the large one.

Lastly, it imight be possible that signalk, as well as any other application or script on a GX device, also writes to other locations on the data partition. In which case a reset to factory defaults usually helps, and if not that then a full re-install. Both are documented in the GX manual: https://www.victronenergy.com/media/pg/Cerbo_GX/en/reset-to-factory-defaults-and-venus-os-reinstall.html

mpvader commented 2 months ago

Ps. ~since encountering above issue~ a long time ago already, ie somewhere in 2023, the signalk defaults have been modified, and any recent signalk install, as well as any recent Venus OS Large image, will no longer have Signal-K fill up the data partition.

That was just a bug, and has been solved. The issue was that signalk by default wrote large debug files to the data partition and/or didn't expire/delete them.

pkkrusty commented 2 months ago

Should clarify, I like having the Large image for node-red, but I don't need SignalK. Currently not possible to have just one and not the other?

My data partition is ok, I was worried about 100% usage on the rootfs partition.

mpvader commented 2 months ago

Its not possible to have one and not the other.

note that normally, also for venus os large, the rootfs is readonly and will never fill up.

Installing 3rd party addons like gui-mods or dbus-serialbms might change it to read write. Run mount and look for rw or r to how you system is on that

mpvader commented 3 weeks ago

hey @mansr , as discussed - any preference on what system calls to use to check for free disk space?

Do we need to try write a file? Or can some simple standard call checking for free diskspace be used?

kwindrem commented 3 weeks ago

I use

availableSpace=$(df -m / | tail -1 | awk '{print $4}') to check fullness in SetupHelper. -m returns available space in megabytes.

This is called from a shell script but could also be done via a procedure call in python.

mansr commented 2 weeks ago

I would suggest checking both free space (blocks) and free inodes (files). In a shell script, stat -f will print these values. From C or C++, use the statvfs system call, also available in Python as os.statvfs. What thresholds to use for a warning will need some consideration. If we want to get fancy, we could even warn if the rate of change over an hour (or maybe day) is too high. That might alert the user to take action before it becomes urgent.