naemon / naemon-core

Networks, Applications and Event Monitor
http://www.naemon.io/
GNU General Public License v2.0
154 stars 63 forks source link

Invalid precache on reload #437

Closed Bryce-Souers closed 11 months ago

Bryce-Souers commented 1 year ago

Hi,

There is a rare issue that I cannot seem to replicate.

I have automation that generates configurations and reload every 8 hours by calling "omd reload naemon". The call to this command returned OK, but when I checked the logs:

[1692979703] Naemon 1.4.0 starting... (PID=3900326)
[1692979703] Local time is Fri Aug 25 16:08:23 UTC 2023
[1692979703] LOG VERSION: 2.0
[1692979703] qh: Socket '/omd/sites/ssnomd/var/naemon/naemon.qh' successfully initialized
[1692979703] nerd: Channel hostchecks registered successfully
[1692979703] nerd: Channel servicechecks registered successfully
[1692979703] nerd: Fully initialized and ready to rock!
[1692979705] Error: Unexpected EOF in file '/omd/sites/ssnomd/var/naemon/objects.precache' on line 1786043 - check for a missing closing bracket.
[1692979706] npcdmod: Copyright (c) 2008-2009 Hendrik Baecker (andurin@process-zero.de) - http://www.pnp4nagios.org
[1692979706] npcdmod: /omd/sites/ssnomd/etc/pnp4nagios/npcd.cfg initialized
[1692979706] npcdmod: spool_dir = '/omd/sites/ssnomd/var/pnp4nagios/spool/'.
[1692979706] npcdmod: perfdata file '/omd/sites/ssnomd/var/pnp4nagios/perfdata.dump'.
[1692979706] npcdmod: Ready to run to have some fun!
[1692979706] Event broker module '/omd/sites/ssnomd/lib/npcdmod4.o' initialized successfully.
[1692979706] livestatus: Setting maximum response size to 419430400 bytes (400.0 MB)
[1692979706] livestatus: Naemon Livestatus 1.4.0, Socket: '/omd/sites/ssnomd/tmp/run/live'
[1692979706] livestatus: Cannot open log archive '/omd/sites/ssnomd/var/naemon/archive'
[1692979706] livestatus: Finished initialization. Further log messages go to /omd/sites/ssnomd/var/naemon/livestatus.log
[1692979706] Event broker module '/omd/sites/ssnomd/lib/naemon/livestatus.o' initialized successfully.
[1692979706] Bailing out due to one or more errors encountered in the configuration files. Run Naemon from the command line with the -v option to verify your config before restarting. (PID=3900326)
[1692979706] Event broker module 'NERD' deinitialized successfully.
[1692979706] npcdmod: If you don't like me, I will go out! Bye.
[1692979706] Event broker module '/omd/sites/ssnomd/lib/npcdmod4.o' deinitialized successfully.
[1692979706] livestatus: deinitializing
[1692979706] livestatus: Logfile cache: flushing complete cache.
[1692979706] Event broker module '/omd/sites/ssnomd/lib/naemon/livestatus.o' deinitialized successfully.

It starts up, and then fails when it tries to read the objects.precache file.

If it's important, I am running this with the docker image: https://hub.docker.com/r/consol/omd-labs-centos

Any ideas why the precache object would be invalid and how I can avoid this?

sni commented 1 year ago

the precached objects file is one of the few files which will be written in place. Not in a tmp file and then moved on success like most other files. Maybe this could be changed... But until then, if you start a precaching job twice, it will potentially corrupt the file as 2 processes open the same file and write into it. Could this be the case here that for some reason you are starting the precaching multiple times?

sni commented 11 months ago

I'd say this is fixed with #439