Closed n0099 closed 3 months ago
Have you tested if setting DatabaseWriteAheadLogging
to 1
(and then restarting the daemon) solves the situation?
I'll try this, also I'm using zfs on /var/lib
, is this issue similar to https://github.com/atuinsh/atuin/issues/952?
I'd don't have any practical experience on using zfs myself. I quick search for sqlite and zfs suggests that enabling WAL (which is what DatabaseWriteAheadLogging
does) can help. You should also check the daemon logs at it should produce warnings whenever database write take longer than 4 seconds. The frequency of those warnings before and after the DatabaseWriteAheadLogging
configuration change should provide some indication if that helps or not. The read timeout is configured at compile time with DBREADTIMEOUTSECS
to 5 seconds.
$ sudo journalctl -u vnstat -g took
Apr 11 18:30:16 azure vnstatd[2169]: Warning: Writing cached data to database took 39.1 seconds.
Apr 11 18:30:16 azure vnstatd[2169]: Warning: Writing cached data to database took 7.5 seconds.
-- Boot c954ea177ebd4c759c98ef0e1fa04a87 --
Apr 13 03:20:11 azure vnstatd[1889]: Warning: Writing cached data to database took 4.3 seconds.
-- Boot c9c52571a5c7441f95967b2ca23cda41 --
Apr 13 08:10:06 azure vnstatd[1941]: Warning: Writing cached data to database took 6.3 seconds.
Apr 13 12:45:06 azure vnstatd[1941]: Warning: Writing cached data to database took 5.2 seconds.
Let me know if that setting helped with the read situation or not as I'm not exactly sure if having WAL enabled also avoids getting the read-only error while slow writes are being done at the same time. With ZFS, the DatabaseSynchronous
setting may also be one possibility to investigate as part of the slow writes issue with sqlite could be due to sqlite trying to ensure the writes have completed while ZFS is doing that also internally at the same time resulting in unnecessary multiplied checks.
As for improving the detectability of the source of the exit status that's usually evaluated when --alert
is used, I'll see if adding exit options 4 and 5 which would match the current 2 and 3 but using exit status 2 (instead of 1 that all the other errors use too) would be the ideal solution or if some sort of --actual-errors-do-not-exit-1
parameter would be better.
I'm using this plain bash as a fuse of monthly data limit:
but when the system is under high load,
vnstat
may rarely encountering https://github.com/vergoh/vnstat/issues/129and it will exit with some non-zero exit code cause false-positive of alerting.