neuvector / prometheus-exporter

Prometheus exporter and Grafana template for NeuVector container security platform
Apache License 2.0
17 stars 28 forks source link

exporter stops responding properly to scraping after a while #9

Closed dadicool closed 5 years ago

dadicool commented 5 years ago

Context is the same as #8

After a while, the exporter starts hitting an exception on every scrape :

 ----------------------------------------
 Exception happened during processing of request from ('10.42.6.76', 51088)
 Traceback (most recent call last):
   File "/usr/lib/python3.7/socketserver.py", line 650, in process_request_thread
     self.finish_request(request, client_address)
   File "/usr/lib/python3.7/socketserver.py", line 360, in finish_request
     self.RequestHandlerClass(request, client_address, self)
   File "/usr/lib/python3.7/socketserver.py", line 720, in __init__
     self.handle()
   File "/usr/lib/python3.7/http/server.py", line 426, in handle
     self.handle_one_request()
   File "/usr/lib/python3.7/http/server.py", line 414, in handle_one_request
     method()
   File "/usr/lib/python3.7/site-packages/prometheus_client/exposition.py", line 152, in do_GET
     output = encoder(registry)
   File "/usr/lib/python3.7/site-packages/prometheus_client/openmetrics/exposition.py", line 14, in generate_latest
     for metric in registry.collect():
   File "/usr/lib/python3.7/site-packages/prometheus_client/registry.py", line 75, in collect
     for metric in collector.collect():
   File "/usr/local/bin/nv_exporter.py", line 42, in collect
     value=sjson["summary"]["services"],
 KeyError: 'summary'
 ----------------------------------------

This happens after maybe 12h of operation or so. Happy to provide any extra context that could be useful for debugging.

becitsthere commented 5 years ago

Thanks. We will look into both of the issues.

becitsthere commented 5 years ago

This is caused to login token hard timeout overnight. The latest PR should fixed the issue.

dadicool commented 5 years ago

we can confirm that the changes addressed the issue successfully. Thanks for the quick turn around!