project-hatohol / hatohol

A unified manager of monitoring software
http://www.hatohol.org/
Other
88 stars 29 forks source link

[server][plugin] In case of error occurring in the sequence between server - plugin, there are cases where it can not be recovered unless it restarts. #2463

Open a24-yamaguchi opened 7 years ago

a24-yamaguchi commented 7 years ago

pluginのログに次のようなログが記録されていると、pluginを再起動しない限りサーバーとの通信が復旧しない。

2404 の画面描画が遅い、CPU100%張り付きの時に顕著。

2016-12-08 08:39:37,052:haplib.py:625 hatohol.haplib:hap2_zabbix_api.py:Poller: [7153]:[ERROR] Request failed.
2016-12-08 08:39:37,059:hap.py:104 hatohol.hap:hap2_zabbix_api.py:Poller: [7153]:[ERROR] Unexpected error: <type 'exceptions.Exception'>, 
2016-12-08 08:39:37,060:hap2_zabbix_api.py:169 hatohol.hap2_zabbix_api:hap2_zabbix_api.py:Poller: [7153]:[ERROR] Polling: aborted.
2016-12-08 08:39:37,066:rabbitmqconnector.py:147 hatohol.rabbitmqconnector:hap2_zabbix_api.py:Poller: [7153]:[DEBUG] {"params": {"numFailure": 1, "lastStatus": "NG", "lastFailureTime": "20161207233937.60564", "lastSuccessTime": "20161207233836.219945", "numSuccess": 1608, "failureReason": "<type 'exceptions.Exception'>, "}, "jsonrpc": "2.0", "method": "putArmInfo", "id": 548602495}

%%244

cosmo0920 commented 7 years ago

この周りにスタックトレースが表示されていませんか? もしも出しても良い情報なのであればエラーが起きた箇所以上の情報を得ることができるのでこのIssueに貼り付けて頂けると幸いです。

a24-yamaguchi commented 7 years ago

この周りにスタックトレースが表示されていませんか?

出ていました。grepして日付でネグっていたようなので、入ってなく失礼しました。 不具合の修正は@masa0612さんに見てもらっています。

2016-12-08 08:39:07,024:rabbitmqconnector.py:147 hatohol.rabbitmqconnector:hap2_zabbix_api.py:Poller: [7153]: [DEBUG] {"params": {"lastInfo": "261719", "events": [{"eventId": "261718", "status": "NG", "triggerId": "13594", "hostId": "10106", "severity": "WARNING", "time": "20161207233847.467929068", "hostName": "Linux0000", "type": "BAD", "brief": "Too many processes running on Linux0000", "extendedInfo": ""}, {"eventId": "261719", "status": "NG", "triggerId": "15202", "hostId": "10168", "severity": "WARNING", "time": "20161207233852.508420820", "hostName": "Linux0004-copy1", "type": "BAD", "brief": "Too many processes running on Linux0004-copy1", "extendedInfo": ""}]}, "jsonrpc": "2.0", "method": "putEvents", "id": 543272569}
2016-12-08 08:39:37,052:haplib.py:625 hatohol.haplib:hap2_zabbix_api.py:Poller: [7153]: [ERROR] Request failed.
2016-12-08 08:39:37,059:hap.py:104 hatohol.hap:hap2_zabbix_api.py:Poller: [7153]: [ERROR] Unexpected error: <type 'exceptions.Exception'>,
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/hatohol/haplib.py", line 1126, in __poll_in_try_block
    self.poll()
  File "/usr/libexec/hatohol/hap2/hatohol/hap2_zabbix_api.py", line 165, in poll
    self.update_events_poll()
  File "/usr/libexec/hatohol/hap2/hatohol/hap2_zabbix_api.py", line 135, in update_events_poll
    self.divide_and_put_data(self.put_events, events)
  File "/usr/lib/python2.7/site-packages/hatohol/haplib.py", line 591, in divide_and_put_data
    put_func(chunk_contents, params, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/hatohol/haplib.py", line 538, in put_events
    self.__wait_response(request_id)
  File "/usr/lib/python2.7/site-packages/hatohol/haplib.py", line 626, in __wait_response
    raise Exception
Exception

2016-12-08 08:39:37,060:hap2_zabbix_api.py:169 hatohol.hap2_zabbix_api:hap2_zabbix_api.py:Poller: [7153]: [ERROR] Polling: aborted.
2016-12-08 08:39:37,066:rabbitmqconnector.py:147 hatohol.rabbitmqconnector:hap2_zabbix_api.py:Poller: [7153]: [DEBUG] {"params": {"numFailure": 1, "lastStatus": "NG", "lastFailureTime": "20161207233937.60564", "lastSuccessTime": "20161207233836.219945", "numSuccess": 1608, "failureReason": "<type 'exceptions.Exception'>, "}, "jsonrpc": "2.0", "method": "putArmInfo", "id": 548602495}