naemon / naemon-core

Networks, Applications and Event Monitor
http://www.naemon.io/
GNU General Public License v2.0
151 stars 63 forks source link

Invalid precache on reload #437

Closed Bryce-Souers closed 8 months ago

Bryce-Souers commented 1 year ago

Hi,

There is a rare issue that I cannot seem to replicate.

I have automation that generates configurations and reload every 8 hours by calling "omd reload naemon". The call to this command returned OK, but when I checked the logs:

[1692979703] Naemon 1.4.0 starting... (PID=3900326)
[1692979703] Local time is Fri Aug 25 16:08:23 UTC 2023
[1692979703] LOG VERSION: 2.0
[1692979703] qh: Socket '/omd/sites/ssnomd/var/naemon/naemon.qh' successfully initialized
[1692979703] nerd: Channel hostchecks registered successfully
[1692979703] nerd: Channel servicechecks registered successfully
[1692979703] nerd: Fully initialized and ready to rock!
[1692979705] Error: Unexpected EOF in file '/omd/sites/ssnomd/var/naemon/objects.precache' on line 1786043 - check for a missing closing bracket.
[1692979706] npcdmod: Copyright (c) 2008-2009 Hendrik Baecker (andurin@process-zero.de) - http://www.pnp4nagios.org
[1692979706] npcdmod: /omd/sites/ssnomd/etc/pnp4nagios/npcd.cfg initialized
[1692979706] npcdmod: spool_dir = '/omd/sites/ssnomd/var/pnp4nagios/spool/'.
[1692979706] npcdmod: perfdata file '/omd/sites/ssnomd/var/pnp4nagios/perfdata.dump'.
[1692979706] npcdmod: Ready to run to have some fun!
[1692979706] Event broker module '/omd/sites/ssnomd/lib/npcdmod4.o' initialized successfully.
[1692979706] livestatus: Setting maximum response size to 419430400 bytes (400.0 MB)
[1692979706] livestatus: Naemon Livestatus 1.4.0, Socket: '/omd/sites/ssnomd/tmp/run/live'
[1692979706] livestatus: Cannot open log archive '/omd/sites/ssnomd/var/naemon/archive'
[1692979706] livestatus: Finished initialization. Further log messages go to /omd/sites/ssnomd/var/naemon/livestatus.log
[1692979706] Event broker module '/omd/sites/ssnomd/lib/naemon/livestatus.o' initialized successfully.
[1692979706] Bailing out due to one or more errors encountered in the configuration files. Run Naemon from the command line with the -v option to verify your config before restarting. (PID=3900326)
[1692979706] Event broker module 'NERD' deinitialized successfully.
[1692979706] npcdmod: If you don't like me, I will go out! Bye.
[1692979706] Event broker module '/omd/sites/ssnomd/lib/npcdmod4.o' deinitialized successfully.
[1692979706] livestatus: deinitializing
[1692979706] livestatus: Logfile cache: flushing complete cache.
[1692979706] Event broker module '/omd/sites/ssnomd/lib/naemon/livestatus.o' deinitialized successfully.

It starts up, and then fails when it tries to read the objects.precache file.

If it's important, I am running this with the docker image: https://hub.docker.com/r/consol/omd-labs-centos

Any ideas why the precache object would be invalid and how I can avoid this?

sni commented 12 months ago

the precached objects file is one of the few files which will be written in place. Not in a tmp file and then moved on success like most other files. Maybe this could be changed... But until then, if you start a precaching job twice, it will potentially corrupt the file as 2 processes open the same file and write into it. Could this be the case here that for some reason you are starting the precaching multiple times?

sni commented 8 months ago

I'd say this is fixed with #439