Closed asomers closed 4 years ago
Current situation When zabbix server performs the check ('get') the LLD is composed and returned. There are no need to run same commands the second time because all data are already gathered. But, immediate trapper sending is inadvisable cause server processes LLD within ~60 seconds - the items will not be there (dependent on server load?). Therefore, timeout is introduced. It's 0 on Windows cause I was unable to pipe and fork at the same time. The fork is needed for LLD to be able to respond initially. I'm strongly against cache files. sender_wrapper.py was invented when there are no Dependent Items available (which is still the case for Zabbix 3.0 - the script must be oldest LTS compatible). And, it might be the solution for multiple server support when latest Zabbix is not available.
Thinking about the solution for 500 drives The structure of the script must be redone completely. I'm theorizing about cascading LLD, if it's even possible. First, an LLD must gather all available disks. Then each disk, with separate LLD (trapper?), will discover it's own SMART names and will send gathered values. The timeout is still needed. I'm still unsure whether disk duplicate check is possible with this approach.
getverb
is for debug only.
So LLD uses get
too? In that case, LLD gets the sleep. What's the point of sleeping if LLD and regular polling both get the same sleep? And what do you mean by "trapper"? And what does any of this have to do with cache files? I'm confused.
get
is a regular LLD polling which returns json (smartctl.discovery[get,{HOST.HOST}]). It's sleeping while the prototype items are created. Then items with values are sent in bulk with zabbix trapper.
It's a confusing scheme, but I only acted in boundaries of zabbix capabilities. :)
One way all of this could be achieved is with cache files for disk output. I'm not going that way.
Ok, so after a few days of study I understand better how Zabbix works and why you do what you do:
So my plan is:
I'm good with the plan, as long as the daemon will not have polling capabilities (on it's own).
I'm struggling to understand the purpose of the configurable "timeout". It doesn't function as a timeout at all. Instead, it's a fixed-length sleep that gets added before sending results in
get
mode, but notgetverb
. The comment says "wait for LLD to be processed by server". The only two motivations I can think of are:1) The server processes multiple responses from the same agent out-of-order. That seems unlikely. It would be a server bug if true.
2) The server performs LLD using a
getverb
operation, whichsender_wrapper
handles by forking and running in the background. That might allow it to process a subsequentget
operation before the originalgetverb
were complete. If this is true, then the correct solution is to block while sending a response rather than running it in the background.Unless you tell me otherwise, I'm going to assume that 2 is the case. I'll fix it and remove the timeout setting.