trevorndodds / elasticsearch-metrics

102 stars 99 forks source link

elasticsearch2elastic.py crach because can't get the clustername #11

Closed keyboardfann closed 7 years ago

keyboardfann commented 7 years ago

The elasticsearch2elastic.py will exit during heavy ES cluster or network problem. When the program stop , we should manually to restart it. Could we have a retry method?

May 01 00:23:51 xxx sh[46351]: Total Elapsed Time: 0.115025043488
May 01 00:23:51 xxx sh[46351]: Total Elapsed Time: 0.173295974731
May 01 00:23:51 xxx sh[46351]: Traceback (most recent call last):
May 01 00:23:51 xxx sh[46351]: File "/admin/scripts/eshealthcollector-stage/elasticsearch2elastic-stage.py", line 118, in <module>
May 01 00:23:51 xxx sh[46351]: main()
May 01 00:23:51 xxx sh[46351]: File "/admin/scripts/eshealthcollector-stage/elasticsearch2elastic-stage.py", line 99, in main
May 01 00:23:51 xxx sh[46351]: fetch_nodestats(clusterName)
May 01 00:23:51 xxx sh[46351]: File "/admin/scripts/eshealthcollector-stage/elasticsearch2elastic-stage.py", line 55, in fetch_nodestats
May 01 00:23:51 xxx sh[46351]: response = urllib.urlopen(urlData)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen
May 01 00:23:51 xxx sh[46351]: return opener.open(url)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/urllib.py", line 208, in open
May 01 00:23:51 xxx sh[46351]: return getattr(self, name)(url)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/urllib.py", line 345, in open_http
May 01 00:23:51 xxx sh[46351]: h.endheaders(data)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 975, in endheaders
May 01 00:23:51 xxx sh[46351]: self._send_output(message_body)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 835, in _send_output
May 01 00:23:51 xxx sh[46351]: self.send(msg)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 797, in send
May 01 00:23:51 xxx sh[46351]: self.connect()
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 778, in connect
May 01 00:23:51 xxx sh[46351]: self.timeout, self.source_address)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
May 01 00:23:51 xxx sh[46351]: for res in getaddrinfo(host, port, 0, SOCK_STREAM):
May 01 00:23:51 xxx sh[46351]: IOError: [Errno socket error] [Errno -2] Name or service not known
keyboardfann commented 7 years ago

Dear @trevorndodds , I add some try catch to fix the crash and let the python script try again next interval. Please help to review the code diff. If you have any problem , we can discuss it. Thank you very much.

https://github.com/trevorndodds/elasticsearch-metrics/pull/12

[root@xxx eshealthcollector-stage]# python /admin/scripts/eshealthcollector-stage/elasticsearch2elastic-stage.py --interval 10
Interval: 10.0 s
IOError: Maybe can't connect to elasticsearch.
IOError: Maybe can't connect to elasticsearch.
IOError: Maybe can't connect to elasticsearch.
IOError: Maybe can't connect to elasticsearch.
trevorndodds commented 7 years ago

sure