trevorndodds / elasticsearch-metrics

102 stars 99 forks source link

elasticsearch2elastic.py will stop while query es too long #7

Closed keyboardfann closed 7 years ago

keyboardfann commented 7 years ago

When elasticsearch2elastic.py run some time , it will crash during to timediff < 0. I think it's because when it query busy ES cluster and interval will bigger than timediff.

[root@xxx init.d]# systemctl status eshealthcollector-prod 
● eshealthcollector-prod.service - Elasticsearch Health Collector - xxx Production Cluster
   Loaded: loaded (/usr/lib/systemd/system/eshealthcollector-prod.service; enabled; vendor preset: disabled)
   Active: active (exited) since Mon 2017-04-24 15:53:13 CST; 12min ago
  Process: 29808 ExecStop=/bin/sh /etc/init.d/eshealthcollector-prod stop (code=exited, status=0/SUCCESS)
  Process: 29818 ExecStart=/bin/sh /etc/init.d/eshealthcollector-prod start (code=exited, status=0/SUCCESS)
 Main PID: 29818 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/eshealthcollector-prod.service

Apr 24 16:02:54 xxx sh[29818]: time:1493020954.75
Apr 24 16:02:54 xxx sh[29818]: timediff8.77753591537
Apr 24 16:02:54 xxx sh[29818]: Total Elapsed Time: 10.9948761463
Apr 24 16:02:54 xxx sh[29818]: nextRun:1493020973.54
Apr 24 16:02:54 xxx sh[29818]: time:1493020974.53
Apr 24 16:02:54 xxx sh[29818]: timediff:-0.994902133942
Apr 24 16:02:54 xxx sh[29818]: Traceback (most recent call last):
Apr 24 16:02:54 xxx sh[29818]: File "/admin/scripts/eshealthcollector-prod/elasticsearch2elastic.py", line 112, in <module>
Apr 24 16:02:54 xxx sh[29818]: time.sleep(timeDiff)
Apr 24 16:02:54 xxx sh[29818]: IOError: [Errno 22] Invalid argument
keyboardfann commented 7 years ago

Submit a pull request. @trevorndodds could you help to review it ? If you have any concern , welcome to discuss it.

trevorndodds commented 7 years ago

Thanks, weird I actually never ran into that issue. I may replace your --interval with a ENV variable at some point as that works better with docker. For now it's fine.

keyboardfann commented 7 years ago

Good news and thank you for quickly review.

joealex commented 7 years ago

Noticed if the ES node the script is pointing to is dead/stopped the script dies after sometime. Is it supposed to keep trying anyway as per interval so that next call should work if ES is back up.

Probably need the exception handling in all the ES calls

keyboardfann commented 6 years ago

Hi @joealex , I think in this commit support retry. https://github.com/trevorndodds/elasticsearch-metrics/commit/c890c85201cea6ec9448340c63068a49ecd841e3