nobody43 / zabbix-smartmontools

Disk SMART monitoring for Linux, FreeBSD and Windows. LLD, trapper.
The Unlicense
54 stars 19 forks source link

Ideas #17

Closed CheckB closed 1 year ago

CheckB commented 5 years ago
  1. I suggest to take some functionality from here: https://github.com/v-zhuravlev/zbx-smartctl https://github.com/rmalenko/zabbix

  2. Smart support for sata ssd

nobody43 commented 5 years ago

Yeah, I'm aware about these repos, but don't want to just copy the functionality straight away.

NVMe support is being worked on, SATA SSD support is planned.

v-zhuravlev commented 5 years ago

Hi, @nobodysu , I'm planning to rewrite https://github.com/v-zhuravlev/zbx-smartctl to python to let more contributors in and also to add some python unittests for better scripts stability. Maybe you are interested to join forces for better smart monitoring in Zabbix?

nobody43 commented 5 years ago

@v-zhuravlev , hi! I'm all in for cooperation - that's for sure! However, I believe public zabbix additions must have certain features, with which you may disagree:

I mostly dislike my current approach with running second script in background, but not really sure there is a way to improve it without much overhead. Maybe separate LLD and individual per-disk trapper and use cache file as a last resort. Also not sure could LLD be properly grouped or not (https://github.com/nobodysu/zabbix-smartmontools/issues/8).

Let me know what you think about my view.

nobody43 commented 5 years ago

@v-zhuravlev Are that restrictions too much for you? :) Its discussable - state your point.

v-zhuravlev commented 5 years ago

@nobodysu, I'm for sure against trapper here.

I personally prefer to work using this in mind first:

I suggest another approach. There is already smartctl solution for Zabbix version 3.0 working(like yours) So maybe a more simple solution can be developed for 4.0 and 4.2 now with current smartctl solutions gradually phased out later on.

As for using dependent items, it makes overhead reasonable, comparing to running smartctl on each metric as it was before 3.4. So I believe overhead is not a problem here anymore.

nobody43 commented 5 years ago

@v-zhuravlev I'm ok to ditch trapper in favor for per-disk text cache file, if that's the case.

Also, I'm a little concerned about DI, can you clear some things out?

A thought: maybe wrap per-disk smartctl call with a wrapper, and return output + logic result for DI? If that's not possible in some way or another, I'm afraid I can't go with DI.

Also I can't write on PowerShell.

v-zhuravlev commented 5 years ago

@nobodysu Sorry for long delays. I like your questions, it shows that you know what your are doing.

Does DI have a timeout trigger in case of unresponsive disk failure?

  • For the start, a simple nodata(30m) trigger would help here.

Does it handle process return code? (DRIVESTATUS)

Does it have some sort of post-processing e.g. if output contains A and B but not C then ? (1, 2, 3)

This can be achieved using JavaScript preprocessing in 4.2

nobody43 commented 5 years ago

@v-zhuravlev Alright. What about python for Windows, can you go with it? I think different implementation languages and features will cause confusion to the end user. Also I'm not going to learn or endorse proprietary solutions like PowerShell. I will learn some JS basics fro DI, however. Really looking forward for cooperation to happen, and it will be really sad if I will not be able to work on Windows version.

v-zhuravlev commented 5 years ago

@nobodysu , I believe that agent-side should require discovery only scripts. And I suggest that it would really great if this discovery can happen without installing python interpreter on each windows host you might need to monitor disks, so:

I think different implementation languages and features will cause confusion to the end user.

Not really, if input and output of the script is the same for Windows and Nix versions.

nobody43 commented 5 years ago

@v-zhuravlev I thought it out many times and realized I can't drop 3.0 LTS. I have many zabbix additions already and can't change my infrastructure entirely. Therefore, I will not be able to test 4.2 solutions extensively, and that's defeats the entire purpose.

For 3.0+, I'm still proposing per-disk cache file (DI substitute), no sender and python-only (JS PP substitute) solution. But afraid you won't like it.