terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
23 stars 13 forks source link

Monitor Gantry Cache Server - Design #517

Closed sbrad77 closed 5 years ago

sbrad77 commented 6 years ago

Discussion Needed

The cache server does not reside at NCSA Who should be monitoring the cache server - local at Maricopa or NCSA What should be the monitoring mechanism What should be monitored - what subdirectories, space, load, etc. What should the appropriate thresholds be

CheckMK monitoring for automated cleaning service - if it crashes, get alert

max-zilla commented 6 years ago

@robkooper @tcnichol and I will sit down for meeting to design what this monitor looks like - initial suggestion, alert triggers if gantry >= 85% full.

jdmaloney commented 5 years ago

Alerts have been integrated into the #alerts slack channel. Currently monitoring:

@robkooper or @max-zilla If you have something else you want me to add let me know, otherwise will close this hear end of today.

jdmaloney commented 5 years ago

I've also added monitoring of the vsftp server (the vsftpd processes) and the globus-gridftp-server service. Alerts will hit slack if either of those services are unhealthy and not in the "active" state according to systemctl