sni / lmd

Livestatus Multitool Daemon - Create livestatus federation from multiple sources
https://labs.consol.de/omd/packages/lmd/
GNU General Public License v3.0
42 stars 31 forks source link

config loading errors should report loudly instead of failing silently with exit code 2 #135

Closed klaernie closed 1 year ago

klaernie commented 1 year ago

I spent yesterday trying to figure out why lmd was not starting up. It took quite some hours to realize, that the config format is not ini, but toml - and toml requires quoting strings always. (my inner monk now needs to rename the config to lmd-local.toml)

The confusing part is, that I tried to run the binary (./lmd -c repro.ini) and it did not print any error message, just silently exited. My next step was checking strace, and I could see that it exited just after reading the config file. Without any error message to guide I the scope was way too big and I started looking through the code to understand where it could have exit(2), but didn't find anything. I read over the config reading part a couple times, and only hours later it popped into my head, that there was no true ini-reader, but a toml-reader. So I read the wikipedia page and it mentioned the need to quote every string.

Here is a small reproduction:

~/repro % ll
total 11376
-rwxr-xr-x 1 kandre kandre 11567104 Mar 17 09:03 lmd
-rw-r--r-- 1 kandre kandre      105 Mar 17 09:03 repro.ini
-rw-r--r-- 1 kandre kandre      107 Mar 17 09:03 repro.toml
~/repro % tail repro*
==> repro.ini <==
Listen = [ '127.0.0.1:9999' ]
[[Connections]]
name = localhost
Source = [ '/var/run/nagios/livestatus' ]

==> repro.toml <==
Listen = [ '127.0.0.1:9999' ]
[[Connections]]
name = 'localhost'
Source = [ '/var/run/nagios/livestatus' ]
~/repro % ./lmd -c repro.ini; echo $?
2
~/repro % ./lmd -c repro.toml; echo $?
lmd - version 2.1.4 (Build: 15be67e, go1.19.6) started with config [repro.toml]
[2023-03-17 09:04:47.641][Info][pid:17184][main:293] lmd - version 2.1.4 (Build: 15be67e, go1.19.6) started with config [repro.toml]
[2023-03-17 09:04:47.685][Info][pid:17184][listener:136] listening for incoming queries on tcp 127.0.0.1:9999
[2023-03-17 09:04:47.685][Info][pid:17184][peer:300] [localhost] starting connection
[2023-03-17 09:04:47.688][Info][pid:17184][peer:1508] [localhost][r:be8c09] site went offline: dial unix /var/run/nagios/livestatus: connect: no such file or directory
[2023-03-17 09:04:47.688][Warn][pid:17184][peer:356] [localhost] initializing objects failed: dial unix /var/run/nagios/livestatus: connect: no such file or directory
^C
[2023-03-17 09:04:50.527][Info][pid:17184][main:736] got sigint, quitting
[2023-03-17 09:04:50.528][Info][pid:17184][listener:146] stopping tcp listener on 127.0.0.1:9999
[2023-03-17 09:04:50.528][Info][pid:17184][listener:147] tcp listener 127.0.0.1:9999 shutdown complete
[2023-03-17 09:04:50.528][Info][pid:17184][main:246] lmd shutdown complete
1
~/repro %

Now I'd have some small wishes:

Thanks and best regards, Andre

klaernie commented 1 year ago

@sni Would you consider adding a --test-config to test the config file and exit accordingly?

sni commented 1 year ago

well, usually the lmd.ini is pretty static not not changed all the time, but why not...

klaernie commented 1 year ago

Well, I'm just asking to protect from stupid - with me being the usually stupid one ;)