sni / lmd

Livestatus Multitool Daemon - Create livestatus federation from multiple sources
https://labs.consol.de/omd/packages/lmd/
GNU General Public License v3.0
42 stars 31 forks source link

Accept "python3" OutputFormat #128

Closed tmuncks closed 9 months ago

tmuncks commented 1 year ago

I needed to make a few minor changes to allow CheckMK to use lmd as a livestatus source.

CheckMK uses OutputFormat: python3 and these tiny changes seemed to do the trick. It seems to work with these changes applied.

That said, I am obviously slightly curious as to why I didn't need to perform any actual modifications to the data, but there you have it.

Please let me know if I'm missing something glaringly obvious. :)

tmuncks commented 1 year ago

Okay. While this may be a start, other things needs work for this to fully function with CheckMK. I get errors in certain situations, which I guess may be due to some datatype thing perhaps?

In lmd:

lmd    | [2022-10-31 18:14:15.868][Warn][pid:1][filter:811] not implemented stringlist op: <

In CheckMK:

TypeError ('>' not supported between instances of 'str' and 'int')

I will try to see if I can figure this out, but really don't have a clue where to start at this point.

sni commented 1 year ago

Well, i have no objections in mergin this. But yes, the python outputformat is in fact simple json. I am not used to python to decide if that's ok or not.

What does the < operator do in list context?

tmuncks commented 1 year ago

This patch does not fully solve the issue at this point. I was hopeful because all my devices showed up, but strange stuff is happening once CheckMK starts performing it's queries.

In the case of the < operator, it appears as if a state entry in the result is returned as an empty string for whatever reason. I'm trying to figure out how to try to catch this and force-convert it into an int, but so far no luck (still figuring out golang and lmd).

As you mention, the python3 outputformat should just be json, so what we need is basically be a python3 alias for the json outputformat. But on top of that, it appears I need a way to filter or force convert certain fields.

If I could figure out a way to make this work, we would learn exactly what it takes to be compatible with CheckMK. And then we will probably better be able to address it properly.

tmuncks commented 1 year ago

Okay, so while I'm no closer to actually solving any of the remaining issues with CheckMK here, I believe I see what causes the problems.

For whatever reason, Livestatus data returned from LMD is sometimes the wrong type. I the case of the < operator error, LMD returns an integer field as an empty string, which causes the CheckMK code to fail. CheckMK will handle a missing value just fine, but it does not expect the int field to contain a string.

I have run a bunch of tests, to try to figure out what is happening. The closest I've gotten, is that BuildLocalResponseData perhaps converts certain empty data structures to an empty string?

Test setup

For anyone willing to have a look, I have set up a dual CheckMK + LMD test system.

I can provide access to these systems as needed.

CheckMK Livestatus info

sni commented 1 year ago

LMD is strongly typed, but of course only for known columns. The empty value is defined here: https://github.com/sni/lmd/blob/9cfc2f589c8a31cd7f7238d5d196a8f2cb0b5fd6/lmd/column.go#L303 So for string columns, the "empty" value is an empty string and for integers it is -1. If lmd does not know anything about a column, it will assume a string column which has "" as empty value. So i'd assume cmk requests an unknown integer column and gets confused by the response.

tmuncks commented 1 year ago

Cool, thanks... I will try to figure out where these known columns are defined, and see if I can generate something that works well with CheckMK...

tmuncks commented 1 year ago

@sni I have found the places to add the supported options, and I can see the defaults changing which is awesome. Some of the fields does not seem to be picked up properly from the source site however, and I was wondering why that might be?

Previously, an unknown field where the source livestatus returns 0, LMD would return "".

After adding that particular field, source livestatus still returns 0, but now LMD returns -1. What might be the reason for this?

If obviously needs to reflect the value from the source site. Ideas?

tmuncks commented 1 year ago

CheckMK appears to do some filtering that requires that < string operator afterall. I have no idea how to implement that.

lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:306] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: GET hosts
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: Stats: state = 0
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: Stats: scheduled_downtime_depth = 0
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: StatsAnd: 2
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: Stats: scheduled_downtime_depth > 0
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: Stats: state = 2
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: Stats: scheduled_downtime_depth = 0
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: StatsAnd: 2
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: Stats: state = 1
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: Stats: scheduled_downtime_depth = 0
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: StatsAnd: 2
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: Filter: custom_variable_names < _REALNAME
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: Localtime: 1669311551
lmd    | [2022-11-24 17:39:11.315][Debug][pid:1][request:779] [r:1b54b6] Ignoring Localtime as LMD works on unix timestamps only.
lmd    | [2022-11-24 17:39:11.316][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: OutputFormat: python3
lmd    | [2022-11-24 17:39:11.316][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: KeepAlive: on
lmd    | [2022-11-24 17:39:11.316][Debug][pid:1][request:327] [172.18.10.1:40208->172.18.10.2:3333][r:1b54b6] request: ResponseHeader: fixed16
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
lmd    | [2022-11-24 17:39:11.316][Warn][pid:1][filter:811] not implemented stringlist op: <
sni commented 1 year ago

The warning is about string lists: https://github.com/sni/lmd/blob/master/lmd/filter.go#L770 Basically 'column < string' is not defined for string lists. What should that operator do? If its an alias for "contains not", then it could simply be added here: https://github.com/sni/lmd/blob/master/lmd/filter.go#L787

hweidner commented 9 months ago

This pull request is still open, but the discussion has drifted to another topic. @sni will you accept the pull request? I would strongly support that.

sni commented 9 months ago

sure, sorry for the delay.