ikseth / cyclops

Hyper Operative Management System
Other
5 stars 2 forks source link

mon timeout #13

Open AngBriz opened 4 years ago

AngBriz commented 4 years ago

Dear Cyclops Developer(s):

This is just simply to report a quite strange monitoring behaviour of Cyclops when a computing node loses the connection/link with the local hard drive.

i) The computing node "syslog" reports the following errors:

nimbus7107 kern err kernel ata1.00: status: { DRDY }
nimbus7107 kern err kernel end_request: I/O error, dev sda, sector 130349857

ii) Computing node state in Cyclops is: "mon timeout".

iii) There are no e-mails sent alerting the issue.

iv) After the reboot and recovering the link with the local hard drive everything becomes normal.

In this case it would be helpful to understand which are the requirements for generating and sending the e-mails alerting issues. Another aspect is to understand why in this particular case Cyclops is not able to generate the information and send it by e-mail.

Thanks in advance for any help you are able to provide.

Cheers,

ikseth commented 4 years ago

Dear User:

This problems could have two solutions:

  1. Change or disable sensors linked with hard disk status control, like smartdisk
  2. Update cyclops, in last version we deploy best timeout controls

Please try one of this options and send us your results.

Thank you for use cyclops