Closed bachandi closed 2 years ago
I have seen this error only when the machine is running multiple instances of Monitorix, or when Monitorix takes too long to save data into the .rrd
files, perhaps because these files reside in a network storage.
I am pretty sure there is only one instance running and the RRD location is on a local SSD. I disabled parallelizing for the moment and will lookout if the error reappears.
My idea was a missing wait for the processes to finish but this should take care of all the processes: https://github.com/mikaku/Monitorix/blob/269b7ead91b338d28a0b74971f34db4eef7bd84a/monitorix.cgi#L646
Not sure what else could be the cause for this.
I don't think parallelizing has problems, and I haven't seen such error before. People who reported to have such message had either one of the possible two causes I stated in my previous post.
Yes and it is still happening without parallelizing so this is not the issue. I logged monitorix with ps aux
and there are no other instances running. Will have to investigate further.
Are all your .rrd
files in the local filesystem?
Are all your
.rrd
files in the local filesystem?
yes
OK.
Also, can you check if all timestamps in .rrd
files are consistent?
I investigated with rrdinfo and they looked ok but I doubt that is what you had in mind?
I noticed an uninitialized value error in process.pm but it is probably unrelated:
Thu Dec 9 18:24:06 2021 - ERROR: while updating /var/lib/monitorix/process.rrd: could not lock RRD
Use of uninitialized value in multiplication (*) at /usr/lib/monitorix/process.pm line 1120.
Use of uninitialized value in multiplication (*) at /usr/lib/monitorix/process.pm line 1121.
Fri Dec 10 11:20:07 2021 - ERROR: while updating /var/lib/monitorix/process.rrd: could not lock RRD
Any news on this?
It is still happens on multiple systems from time to time but could not narrow it down why it happens. I suspect we can close this for the moment till there is new information on it.
OK, let's close it and will reopen it if something new comes. Thanks.
Could this be related to XXX_update
and XXX_cgi
routines being called by different processes? To my understanding the update routines are called by the monitorix process and the cgi by the process called by the browser?
If these two calls overlap by chance the update
routine could get problems locking the rrd file if it read at the same moment by the cgi
routine?
The routines XXX_cgi
only make read-only calls, so I don't think that this creates any locking problem.
I think that it would has been manifested before since there is a probability to coincide when monitorix.cgi
is reading data from .rrd
files and when monitorix
is saving data. I'm thinking on my server which is used as the Monitorix demo and is used pretty often by people from everywhere, and I've never seen these locking errors.
I still believe these locking errors are related to reading and writing at the same time. I tested it with a small shell script to write and read a nvme.rrd lookalike and encountered these locking errors when reading and writing at the same time.
A quick bash script(test_rrd_locking.sh) which emulates reading and writing from monitorix. This way we can test the rrd locking:
#! /usr/bin/bash
RRDFILE=$1
SLEEPTIME_IN_S=$2
ACTION=$3
READ_COMMAND() {
rrdtool graphv test01z.1day.svg --title=Title --start=-1day --imgformat=SVG --vertical-label=label --width=800 --height=300 --full-size-mode --zoom=1 --slope-mode DEF:hd0_smv0=${RRDFILE}:nvme0_hd0_smv0:AVERAGE DEF:hd1_smv0=${RRDFILE}:nvme0_hd1_smv0:AVERAGE DEF:hd2_smv0=${RRDFILE}:nvme0_hd2_smv0:AVERAGE DEF:hd3_smv0=${RRDFILE}:nvme0_hd3_smv0:AVERAGE DEF:hd4_smv0=${RRDFILE}:nvme0_hd4_smv0:AVERAGE DEF:hd5_smv0=${RRDFILE}:nvme0_hd5_smv0:AVERAGE DEF:hd6_smv0=${RRDFILE}:nvme0_hd6_smv0:AVERAGE DEF:hd7_smv0=${RRDFILE}:nvme0_hd7_smv0:AVERAGE CDEF:allvalues=hd0_smv0,UN,0,1,IF,hd1_smv0,UN,0,1,IF,hd2_smv0,UN,0,1,IF,hd3_smv0,UN,0,1,IF,hd4_smv0,UN,0,1,IF,hd5_smv0,UN,0,1,IF,hd6_smv0,UN,0,1,IF,hd7_smv0,UN,0,1,IF,+,+,+,+,+,+,+,0,GT,1,UNKN,IF CDEF:trans_hd0_smv0=hd0_smv0 CDEF:trans_hd1_smv0=hd1_smv0 CDEF:trans_hd2_smv0=hd2_smv0 CDEF:trans_hd3_smv0=hd3_smv0 CDEF:trans_hd4_smv0=hd4_smv0 CDEF:trans_hd5_smv0=hd5_smv0 CDEF:trans_hd6_smv0=hd6_smv0 CDEF:trans_hd7_smv0=hd7_smv0 CDEF:wrongdata=allvalues,UN,INF,UNKN,IF LINE2:trans_hd0_smv0#FFA500:/dev/nvme0 AREA:wrongdata#FFFFFF:
}
WRITE_COMMAND() {
rrdtool update ${RRDFILE} N:38:100:0:16738106:0:20:32282126:0.0569779319774506:0:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN
}
CREATE_COMMAND() {
rrdtool create ${RRDFILE} --step=60 DS:nvme0_hd0_smv0:GAUGE:120:0:U DS:nvme0_hd0_smv1:GAUGE:120:0:U DS:nvme0_hd0_smv2:GAUGE:120:0:U DS:nvme0_hd0_smv3:GAUGE:120:0:U DS:nvme0_hd0_smv4:GAUGE:120:0:U DS:nvme0_hd0_smv5:GAUGE:120:0:U DS:nvme0_hd0_smv6:GAUGE:120:0:U DS:nvme0_hd0_smv7:GAUGE:120:0:U DS:nvme0_hd0_smv8:GAUGE:120:0:U DS:nvme0_hd1_smv0:GAUGE:120:0:U DS:nvme0_hd1_smv1:GAUGE:120:0:U DS:nvme0_hd1_smv2:GAUGE:120:0:U DS:nvme0_hd1_smv3:GAUGE:120:0:U DS:nvme0_hd1_smv4:GAUGE:120:0:U DS:nvme0_hd1_smv5:GAUGE:120:0:U DS:nvme0_hd1_smv6:GAUGE:120:0:U DS:nvme0_hd1_smv7:GAUGE:120:0:U DS:nvme0_hd1_smv8:GAUGE:120:0:U DS:nvme0_hd2_smv0:GAUGE:120:0:U DS:nvme0_hd2_smv1:GAUGE:120:0:U DS:nvme0_hd2_smv2:GAUGE:120:0:U DS:nvme0_hd2_smv3:GAUGE:120:0:U DS:nvme0_hd2_smv4:GAUGE:120:0:U DS:nvme0_hd2_smv5:GAUGE:120:0:U DS:nvme0_hd2_smv6:GAUGE:120:0:U DS:nvme0_hd2_smv7:GAUGE:120:0:U DS:nvme0_hd2_smv8:GAUGE:120:0:U DS:nvme0_hd3_smv0:GAUGE:120:0:U DS:nvme0_hd3_smv1:GAUGE:120:0:U DS:nvme0_hd3_smv2:GAUGE:120:0:U DS:nvme0_hd3_smv3:GAUGE:120:0:U DS:nvme0_hd3_smv4:GAUGE:120:0:U DS:nvme0_hd3_smv5:GAUGE:120:0:U DS:nvme0_hd3_smv6:GAUGE:120:0:U DS:nvme0_hd3_smv7:GAUGE:120:0:U DS:nvme0_hd3_smv8:GAUGE:120:0:U DS:nvme0_hd4_smv0:GAUGE:120:0:U DS:nvme0_hd4_smv1:GAUGE:120:0:U DS:nvme0_hd4_smv2:GAUGE:120:0:U DS:nvme0_hd4_smv3:GAUGE:120:0:U DS:nvme0_hd4_smv4:GAUGE:120:0:U DS:nvme0_hd4_smv5:GAUGE:120:0:U DS:nvme0_hd4_smv6:GAUGE:120:0:U DS:nvme0_hd4_smv7:GAUGE:120:0:U DS:nvme0_hd4_smv8:GAUGE:120:0:U DS:nvme0_hd5_smv0:GAUGE:120:0:U DS:nvme0_hd5_smv1:GAUGE:120:0:U DS:nvme0_hd5_smv2:GAUGE:120:0:U DS:nvme0_hd5_smv3:GAUGE:120:0:U DS:nvme0_hd5_smv4:GAUGE:120:0:U DS:nvme0_hd5_smv5:GAUGE:120:0:U DS:nvme0_hd5_smv6:GAUGE:120:0:U DS:nvme0_hd5_smv7:GAUGE:120:0:U DS:nvme0_hd5_smv8:GAUGE:120:0:U DS:nvme0_hd6_smv0:GAUGE:120:0:U DS:nvme0_hd6_smv1:GAUGE:120:0:U DS:nvme0_hd6_smv2:GAUGE:120:0:U DS:nvme0_hd6_smv3:GAUGE:120:0:U DS:nvme0_hd6_smv4:GAUGE:120:0:U DS:nvme0_hd6_smv5:GAUGE:120:0:U DS:nvme0_hd6_smv6:GAUGE:120:0:U DS:nvme0_hd6_smv7:GAUGE:120:0:U DS:nvme0_hd6_smv8:GAUGE:120:0:U DS:nvme0_hd7_smv0:GAUGE:120:0:U DS:nvme0_hd7_smv1:GAUGE:120:0:U DS:nvme0_hd7_smv2:GAUGE:120:0:U DS:nvme0_hd7_smv3:GAUGE:120:0:U DS:nvme0_hd7_smv4:GAUGE:120:0:U DS:nvme0_hd7_smv5:GAUGE:120:0:U DS:nvme0_hd7_smv6:GAUGE:120:0:U DS:nvme0_hd7_smv7:GAUGE:120:0:U DS:nvme0_hd7_smv8:GAUGE:120:0:U RRA:AVERAGE:0.5:1:1440 RRA:AVERAGE:0.5:30:336 RRA:AVERAGE:0.5:60:744 RRA:AVERAGE:0.5:1440:365 RRA:MIN:0.5:1:1440 RRA:MIN:0.5:30:336 RRA:MIN:0.5:60:744 RRA:MIN:0.5:1440:365 RRA:MAX:0.5:1:1440 RRA:MAX:0.5:30:336 RRA:MAX:0.5:60:744 RRA:MAX:0.5:1440:365 RRA:LAST:0.5:1:1440 RRA:LAST:0.5:30:336 RRA:LAST:0.5:60:744 RRA:LAST:0.5:1440:365
}
ACTION_LOOP() {
echo "performing ${ACTION} action to ${RRDFILE} with sleeptime of ${SLEEPTIME_IN_S}s"
INDEX=0
echo "testing ${ACTION} ${INDEX}"
((++INDEX))
($1)
while [ $? -eq 0 ]; do
sleep ${SLEEPTIME_IN_S}
echo "testing ${ACTION} ${INDEX}"
((++INDEX))
($1)
done
echo "aborted ${ACTION} action at $INDEX"
}
case "${ACTION}" in
create)
echo "creating ${RRDFILE}"
CREATE_COMMAND
echo "Done"
;;
read)
ACTION_LOOP READ_COMMAND
;;
write)
ACTION_LOOP WRITE_COMMAND
;;
*)
echo "usage: $0 <filename> <sleeptime> {create|read|write}"
exit 1
;;
esac
exit 0
Reproducing the locking error:
$ ./test_rrd_locking.sh test.rrd 0 create
$ ./test_rrd_locking.sh test.rrd 0.051 write
$ ./test_rrd_locking.sh test.rrd 0.05 read
After ~(10-200) iterations I end up getting the locking error either on reading or writing. Running just reading or just writing worked flawlessly for millions of iterations. But when doing it at the same time the locking errors pop up again.
@mikaku maybe this way you are able to reproduce the rrd locking error?
Yes, I've been running the script and I was able to reproduce the locking error. I think though that the way how this script runs and stresses the RRD read/write mechanism is far from what I've seen in normal Monitorix installations.
There is a mail thread where Tobias Oetiker, the author of RRDtool, comments that the file locking mechanism in RRDtool handles concurrent reads and writes.
Nevertheless if, for an unknown reason, your system is accessing .rrd
files at this so high rate, maybe you could use the rrdcached. I've never used before, but I think that Monitorix would need some tweaking to include support for it.
This might well worth to give a try.
Yes, I've been running the script and I was able to reproduce the locking error. I think though that the way how this script runs and stresses the RRD read/write mechanism is far from what I've seen in normal Monitorix installations.
Yes, Monitorix will not encounter these frequencies usually it was just a test to check if reading from and writing to rrd in the way Monitorix does it at the moment can cause this issue.
My assumption at the moment is that it occurs by chance if the browser initiated cgi call and monitorix initiated update call overlap.
I have the monitorix page open continuously which increases the chance of overlapping calls to the rrd. I experience these locking errors from Monitorix roughly every other day and most of the time when this happens some plots get some dips which destroys the axis scaling till it is out of range again. And one has to check if is an actual indication of an hard/software problem or just an artefact from the locking error.
There is a mail thread where Tobias Oetiker, the author of RRDtool, comments that the file locking mechanism in RRDtool handles concurrent reads and writes.
I have read it but I am not sure if he is referring to rrdchached, rrd files in general or that the user has to handle the locks by them self and rrdtool just provides the check if the file is locked.
Nevertheless if, for an unknown reason, your system is accessing
.rrd
files at this so high rate, maybe you could use the rrdcached. I've never used before, but I think that Monitorix would need some tweaking to include support for it.
I am not concerned about this happening at high rates but see it also happening at "normal" rates (just having the page open in a browser) and for everyone it is just a matter of time till anyone encounters this error as it happens by chance because of unprotected read/write access to the rrd files.
In principle if the error occurs during reading it is not that big of a problem as one can just refresh the page and get a new plot which if now probably correct. But if the error occurs during writing the rrd it might introduce wrongly missing data to the rrd which will stay there forever.
rrdcached would be one fix but I also don't have experience with it up to now. Would have to look into it.
Another fix would be to check when reading or writing if the file is locked, if is is locked wait a bit and try again.
Something like this:
my$ number_of_rrd_retries = 3;
for (my $i_retry = 0; $i_retry < number_of_rrd_retries; $i_retry++) {
RRDs::update($rrd, $rrdata);
logger("$myself: $rrdata") if $debug;
my $err = RRDs::error;
if ($err) {
if (index($err, "could not lock RRD") != -1) {
if ($i_retry + 1 < number_of_rrd_retries) {
logger("WARNING: will retry locking because while updating $rrd: $err");
usleep(1000);
} else {
logger("ERROR: reached maximum retries for locking $rrd: $err");
return;
}
} else {
logger("ERROR: while updating $rrd: $err");
return;
}
} else {
return;
}
}
Maybe there is a better way to check of a locking error besides parsing for could not lock RRD
.
I have the monitorix page open continuously which increases the chance of overlapping calls to the rrd. I experience these locking errors from Monitorix roughly every other day and most of the time when this happens some plots get some dips which destroys the axis scaling till it is out of range again. And one has to check if is an actual indication of an hard/software problem or just an artefact from the locking error.
Yes, and if someone reduces the value of refresh_rate
which is by default 150 (seconds) then there will be even more chances to have that locking issue.
Maybe there is a better way to check of a locking error besides parsing for could not lock RRD.
I was just thinking in flock. We could introduce a flock($fh, LOCK_EX)
(exclusive lock) right before calling the uptdate
functions of all graphs, here:
https://github.com/mikaku/Monitorix/blob/ff9b04d8e4bcbf1c9d58dd99b1f9e2e33b004b08/monitorix#L768
and a flock($fh, LOCK_UN)
(unlock) here:
https://github.com/mikaku/Monitorix/blob/ff9b04d8e4bcbf1c9d58dd99b1f9e2e33b004b08/monitorix#L788
Also we will have to include the same two flocks
in monitorix.cgi
right before calling the cgi
functions of all graphs:
here: https://github.com/mikaku/Monitorix/blob/ff9b04d8e4bcbf1c9d58dd99b1f9e2e33b004b08/monitorix.cgi#L580
and here: https://github.com/mikaku/Monitorix/blob/ff9b04d8e4bcbf1c9d58dd99b1f9e2e33b004b08/monitorix.cgi#L655
The advantage of flock(2)
over an sleep time is that a process trying to acquire the exclusive lock will block until it's unlocked by the other process.
Forgot to say that the file that will be used for flock
can be a temporary file in /tmp/
.
Also, emailreports
module might have also the same flock
as it accesses the .rrd
files when sending the reports, although this has a very low access rate.
I was just thinking in flock. We could introduce a
flock($fh, LOCK_EX)
(exclusive lock) right before calling theuptdate
functions of all graphs, here:
Yes this would be better than sleeping.
It seems rrdtool has some kind of flock but somehow it does not lock in the monitorix use case: https://github.com/oetiker/rrdtool-1.x/blob/6b05d94486a478b91724c6babfbcc565e362fe46/src/rrd_open.c#L796
We could also use the rddfile for flock. I tested this in a slightly modified version of the above test script by adding flock to the read and write commad.
Replace the above read and write commands with:
READ_COMMAND() {
flock -s -F ${RRDFILE} -c "rrdtool graphv test01z.1day.svg --title=Title --start=-1day --imgformat=SVG --vertical-label=label --width=800 --height=300 --full-size-mode --zoom=1 --slope-mode DEF:hd0_smv0=${RRDFILE}:nvme0_hd0_smv0:AVERAGE DEF:hd1_smv0=${RRDFILE}:nvme0_hd1_smv0:AVERAGE DEF:hd2_smv0=${RRDFILE}:nvme0_hd2_smv0:AVERAGE DEF:hd3_smv0=${RRDFILE}:nvme0_hd3_smv0:AVERAGE DEF:hd4_smv0=${RRDFILE}:nvme0_hd4_smv0:AVERAGE DEF:hd5_smv0=${RRDFILE}:nvme0_hd5_smv0:AVERAGE DEF:hd6_smv0=${RRDFILE}:nvme0_hd6_smv0:AVERAGE DEF:hd7_smv0=${RRDFILE}:nvme0_hd7_smv0:AVERAGE CDEF:allvalues=hd0_smv0,UN,0,1,IF,hd1_smv0,UN,0,1,IF,hd2_smv0,UN,0,1,IF,hd3_smv0,UN,0,1,IF,hd4_smv0,UN,0,1,IF,hd5_smv0,UN,0,1,IF,hd6_smv0,UN,0,1,IF,hd7_smv0,UN,0,1,IF,+,+,+,+,+,+,+,0,GT,1,UNKN,IF CDEF:trans_hd0_smv0=hd0_smv0 CDEF:trans_hd1_smv0=hd1_smv0 CDEF:trans_hd2_smv0=hd2_smv0 CDEF:trans_hd3_smv0=hd3_smv0 CDEF:trans_hd4_smv0=hd4_smv0 CDEF:trans_hd5_smv0=hd5_smv0 CDEF:trans_hd6_smv0=hd6_smv0 CDEF:trans_hd7_smv0=hd7_smv0 CDEF:wrongdata=allvalues,UN,INF,UNKN,IF LINE2:trans_hd0_smv0#FFA500:/dev/nvme0 AREA:wrongdata#FFFFFF:"
}
WRITE_COMMAND() {
flock -F ${RRDFILE} -c "rrdtool update ${RRDFILE} N:38:100:0:16738106:0:20:32282126:0.0569779319774506:0:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN:NaN"
}
This way the test script also works with high rates.
We could add these routines to Monitorix.pm:
sub flocked_rrd_update {
my ($myself, $debug, $rrd, $rrdata) = @_;
open(my $rrd_fh, ">>", $rrd) or die "Can't open $rrd: $!";
flock($rrd_fh, LOCK_EX) or die "Cannot lock rrd $!\n";
RRDs::update($rrd, $rrdata);
logger("$myself: $rrdata") if $debug;
my $err = RRDs::error;
logger("ERROR: while updating $rrd: $err") if $err;
flock($rrd_fh, LOCK_UN) or die "Cannot unlock rrd - $!\n";
}
sub flocked_rrd_fetch {
my ($rrd, $fetch_args) = @_;
open(my $rrd_fh, ">>", $rrd) or die "Can't open $rrd: $!";
flock($rrd_fh, LOCK_SH) or die "Cannot lock rrd $!\n";
my (undef, undef, undef, $data) = RRDs::fetch("$rrd", $fetch_args);
my $err = RRDs::error;
flock($rrd_fh, LOCK_UN) or die "Cannot unlock rrd - $!\n";
return ($data, $err);
}
sub flocked_rrd_call {
my ($rrd, $rrd_call, $call_args) = @_;
open(my $rrd_fh, ">>", $rrd) or die "Can't open $rrd: $!";
flock($rrd_fh, LOCK_SH) or die "Cannot lock rrd $!\n";
my $rrd_return_data = $rrd_call->($call_args);
my $err = RRDs::error;
flock($rrd_fh, LOCK_UN) or die "Cannot unlock rrd - $!\n";
return ($rrd_return_data, $err);
}
And replace the update RRDs calls:
RRDs::update($rrd, $rrdata);
logger("$myself: $rrdata") if $debug;
my $err = RRDs::error;
logger("ERROR: while updating $rrd: $err") if $err;
to:
flocked_rrd_update($myself, $debug, $rrd, $rrdata);
The fetch call:
my (undef, undef, undef, $data) = RRDs::fetch("$rrd",
"--resolution=$tf->{res}",
"--start=-$tf->{nwhen}$tf->{twhen}",
"AVERAGE");
$err = RRDs::error;
would be replaced by:
my ($data, $err) = flocked_rrd_fetch($rrd, ("--resolution=$tf->{res}", "--start=-$tf->{nwhen}$tf->{twhen}", "AVERAGE"));
and the graphv call would be replaced by:
my $err;
($pic, $err) = flocked_rrd_call($rrd, $rrd{$version}, ("$IMG_DIR" . $IMG[$e * $plots_per_list_item + $n_plot],
"--title=$plot_title ($tf->{nwhen}$tf->{twhen})",
"--start=-$tf->{nwhen}$tf->{twhen}",
"--imgformat=$imgfmt_uc",
"--vertical-label=" . $y_axis_title,
"--width=$width",
"--height=$height",
@extra,
@riglim,
$zoom,
@{$cgi->{version12}},
$large_plot ? () : @{$cgi->{version12_small}},
@{$colors->{graph_colors}},
@def_sensor_average,
$cdef_sensor_allvalues,
@CDEF,
@tmp)
);
But using the rrd as file for flock requires that the cgi routine has write access to the rrd file to be able to lock it. Not sure if this is desired.
Maybe one global lock file in /tmp/
like you proposed with write access for both calls is better?
But using the rrd as file for flock requires that the cgi routine has write access to the rrd file to be able to lock it. Not sure if this is desired.
That's not possible. The .rrd
files are owned by the root user and the cgi program runs under a regular user (usually nobody
) for security reasons. The owner of the cgi program might change if the user switches from the built-in Monitorix HTTP server to an Apache web server. It would be complicated to manage also these user changes in the .rrd
files.
Maybe one global lock file in /tmp like you proposed with write access for both calls is better?
Both solutions have advantages and disadvantages. If we choose the global lock path the changes in the code are minimal and the .rrd
files and cgi continue as now. In the other case, if we choose you suggestion we have better lock granularity but this is incompatible with the owner of .rrd
files and the user of cgi.
Since this is something that almost no user is experiencing, I'd choose the global lock path and it should rely on a new option which would be disabled by default. Something like enable_rrd_lock
?
Since this is something that almost no user is experiencing, I'd choose the global lock path and it should rely on a new option which would be disabled by default. Something like
enable_rrd_lock
?
Agreed, that would be the best.
In rare instances (roughly once every few days) I noticed an error in the monitorix log files about not being able to lock the RRD. The exact file is quite random up to now.
ERROR: while updating /var/lib/monitorix/XXX.rrd: could not lock RRD
I have only one instance of monitorix running per machine but have
enable_parallelizing = y
. When looking in the source it is only parallelizing per plot collection so this should also not be the issue or is there another process also accessing the RRDs?Any idea what could cause this error?