sensu-plugins / sensu-plugins-elasticsearch

This plugin provides native ElasticSearch instrumentation for monitoring and metrics collection, including: service health and metrics for cluster, node, and more.
http://sensu-plugins.io
MIT License
32 stars 77 forks source link

Strip invalid lines #137

Open boutetnico opened 5 years ago

boutetnico commented 5 years ago

Pull Request Checklist

Is this in reference to an existing issue? No

General

Purpose

This PR removes the following invalid lines from the output of the plugin:

elk01.elasticsearch.os.load_average
elk01.elasticsearch.indices.indexing.is_throttled false 1548954513
elk01.elasticsearch.indices.segments.file_sizes {} 1548954513
elk01.elasticsearch.fs.least_usage_estimate.path /data/elasticsearch/data/elk01-elk01/nodes/0 1548954513
elk01.elasticsearch.fs.most_usage_estimate.path /data/elasticsearch/data/elk01-elk01/nodes/0 1548954513
elk01.elasticsearch.fs.io_stats.devices {"device_name"=>"nvme1n1p1", "operations"=>157872407, "read_operations"=>691872, "write_operations"=>157180535, "read_kilobytes"=>50042456, "write_kilobytes"=>2478783444} 1548954513
elk01.elasticsearch.fs.io_stats.total {"operations"=>157872407, "read_operations"=>691872, "write_operations"=>157180535, "read_kilobytes"=>50042456, "write_kilobytes"=>2478783444} 1548954513

These lines are ignored by carbon and cause warning. This PR allows to remove these warnings.

Feb  1 15:59:49 sensu01 carbon-cache.py[480]: 01/02/2019 15:59:49 :: [listener] invalid line received from client 127.0.0.1:42532, ignoring [elk01.elasticsearch.indices.indexing.is_throttled false 1549036789]
Feb  1 15:59:49 sensu01 carbon-cache.py[480]: 01/02/2019 15:59:49 :: [listener] invalid line received from client 127.0.0.1:42532, ignoring [elk01.elasticsearch.indices.segments.file_sizes {} 1549036789]
Feb  1 15:59:49 sensu01 carbon-cache.py[480]: 01/02/2019 15:59:49 :: [listener] invalid line received from client 127.0.0.1:42532, ignoring [elk01.elasticsearch.fs.least_usage_estimate.path /data/elasticsearch/data/elk01-elk01/nodes/0 1549036789]
Feb  1 15:59:49 sensu01 carbon-cache.py[480]: 01/02/2019 15:59:49 :: [listener] invalid line received from client 127.0.0.1:42532, ignoring [elk01.elasticsearch.fs.most_usage_estimate.path /data/elasticsearch/data/elk01-elk01/nodes/0 1549036789]
Feb  1 15:59:49 sensu01 carbon-cache.py[480]: 01/02/2019 15:59:49 :: [listener] invalid line received from client 127.0.0.1:42532, ignoring [elk01.elasticsearch.fs.io_stats.devices {"device_name"=>"nvme1n1p1", "operations"=>160317981, "read_operations"=>705794, "write_operations"=>159612187, "read_kilobytes"=>51039824, "write_kilobytes"=>2539726412} 1549036789]
Feb  1 15:59:49 sensu01 carbon-cache.py[480]: 01/02/2019 15:59:49 :: [listener] invalid line received from client 127.0.0.1:42532, ignoring [elk01.elasticsearch.fs.io_stats.total {"operations"=>160317981, "read_operations"=>705794, "write_operations"=>159612187, "read_kilobytes"=>51039824, "write_kilobytes"=>2539726412} 1549036789]

Known Compatibility Issues

None

majormoses commented 5 years ago

Hmm the change makes sense but in some of the cases I am wondering if we should do some extra manipulation for those cases so that we can try to include those. For example:

elk01.elasticsearch.indices.indexing.is_throttled false 1548954513

It seems like we can in that case probably just drop the false and send it.

Maybe in these cases:

elk01.elasticsearch.fs.least_usage_estimate.path /data/elasticsearch/data/elk01-elk01/nodes/0 1548954513
elk01.elasticsearch.fs.most_usage_estimate.path /data/elasticsearch/data/elk01-elk01/nodes/0 1548954513
elk01.elasticsearch.fs.io_stats.devices {"device_name"=>"nvme1n1p1", "operations"=>157872407, "read_operations"=>691872, "write_operations"=>157180535, "read_kilobytes"=>50042456, "write_kilobytes"=>2478783444} 1548954513

We can probably walk the additionally nested key values and return that. I am certainly OK with having some fallback to discard but I would like to handle and return as much data as possible. WDYT?

majormoses commented 5 years ago

@boutetnico checking back and seeing what your thoughts were on my feedback.

boutetnico commented 5 years ago

@majormoses I agree and will try to implement your suggestions.

boutetnico commented 5 years ago

@majormoses I have made some progress, here is a diff of the keys from the ouput compared before and after this PR. I have not included keys that contain a path because it could lead to very long metrics keys. What do you think?

1d0
< elk01.elasticsearch.os.load_average
91d89
< elk01.elasticsearch.indices.segments.file_sizes
222d219
< elk01.elasticsearch.fs.least_usage_estimate.path
226d222
< elk01.elasticsearch.fs.most_usage_estimate.path
230,231c226,235
< elk01.elasticsearch.fs.io_stats.devices
< elk01.elasticsearch.fs.io_stats.total
---
> elk01.elasticsearch.fs.io_stats.devices.nvme1n1p1.operations
> elk01.elasticsearch.fs.io_stats.devices.nvme1n1p1.read_operations
> elk01.elasticsearch.fs.io_stats.devices.nvme1n1p1.write_operations
> elk01.elasticsearch.fs.io_stats.devices.nvme1n1p1.read_kilobytes
> elk01.elasticsearch.fs.io_stats.devices.nvme1n1p1.write_kilobytes
> elk01.elasticsearch.fs.io_stats.total.operations
> elk01.elasticsearch.fs.io_stats.total.read_operations
> elk01.elasticsearch.fs.io_stats.total.write_operations
> elk01.elasticsearch.fs.io_stats.total.read_kilobytes
> elk01.elasticsearch.fs.io_stats.total.write_kilobytes
boutetnico commented 5 years ago

@majormoses going back to the initial idea behind this PR, can you advise on what I should change so that it can be merged?

majormoses commented 5 years ago

@boutetnico sounds like we are supposed to silently swallow errors which does not make me feel good. IMHO when its a check type of metric while sensu should capture the stdout and stderr I think the stderr should be reserved for reporting an error back to the operator and the handlers when shipping data should ignore stderr. @jspaleta I think this warrants more discussion but for now that seems like the appropriate path.