nobody43 / zabbix-smartmontools

Disk SMART monitoring for Linux, FreeBSD and Windows. LLD, trapper.
The Unlicense
54 stars 19 forks source link

Don't sleep in sender_wrapper.py #28

Closed asomers closed 4 years ago

asomers commented 4 years ago

I'm struggling to understand the purpose of the configurable "timeout". It doesn't function as a timeout at all. Instead, it's a fixed-length sleep that gets added before sending results in get mode, but not getverb. The comment says "wait for LLD to be processed by server". The only two motivations I can think of are:

1) The server processes multiple responses from the same agent out-of-order. That seems unlikely. It would be a server bug if true.

2) The server performs LLD using a getverb operation, which sender_wrapper handles by forking and running in the background. That might allow it to process a subsequent get operation before the original getverb were complete. If this is true, then the correct solution is to block while sending a response rather than running it in the background.

Unless you tell me otherwise, I'm going to assume that 2 is the case. I'll fix it and remove the timeout setting.

nobody43 commented 4 years ago

Current situation When zabbix server performs the check ('get') the LLD is composed and returned. There are no need to run same commands the second time because all data are already gathered. But, immediate trapper sending is inadvisable cause server processes LLD within ~60 seconds - the items will not be there (dependent on server load?). Therefore, timeout is introduced. It's 0 on Windows cause I was unable to pipe and fork at the same time. The fork is needed for LLD to be able to respond initially. I'm strongly against cache files. sender_wrapper.py was invented when there are no Dependent Items available (which is still the case for Zabbix 3.0 - the script must be oldest LTS compatible). And, it might be the solution for multiple server support when latest Zabbix is not available.

Thinking about the solution for 500 drives The structure of the script must be redone completely. I'm theorizing about cascading LLD, if it's even possible. First, an LLD must gather all available disks. Then each disk, with separate LLD (trapper?), will discover it's own SMART names and will send gathered values. The timeout is still needed. I'm still unsure whether disk duplicate check is possible with this approach.

nobody43 commented 4 years ago

getverb is for debug only.

asomers commented 4 years ago

So LLD uses get too? In that case, LLD gets the sleep. What's the point of sleeping if LLD and regular polling both get the same sleep? And what do you mean by "trapper"? And what does any of this have to do with cache files? I'm confused.

nobody43 commented 4 years ago

get is a regular LLD polling which returns json (smartctl.discovery[get,{HOST.HOST}]). It's sleeping while the prototype items are created. Then items with values are sent in bulk with zabbix trapper. It's a confusing scheme, but I only acted in boundaries of zabbix capabilities. :) One way all of this could be achieved is with cache files for disk output. I'm not going that way.

asomers commented 4 years ago

Ok, so after a few days of study I understand better how Zabbix works and why you do what you do:

So my plan is:

nobody43 commented 4 years ago

I'm good with the plan, as long as the daemon will not have polling capabilities (on it's own).