hints = The collectd-mlab service runs in the mlab_utility slice. Try running the ansible/disco/update-mlab-utility.yaml Ansible playbook in the mlabops repository to configure collectd-mlab. Login to the node and run the check script manually to see what the specific error is (/usr/lib/nagios/plugins/check_collectd_mlab.py).
hints = The collectd-mlab service runs in the mlab_utility slice. Try running the ansible/disco/update-mlab-utility.yaml Ansible playbook in the mlabops repository to configure collectd-mlab. Login to the node and run the check script manually to see what the specific error is (/usr/lib/nagios/plugins/check_collectd_mlab.py).
The switch at dfw03 was down for a week. It's back now. While the machines could not access the internet, some services died. I have requested that OTI ops reboot the machine.
Alertmanager URL: https://mlab:YOztKFSKnRMz2GN1qFPueAku9WhmDYV2@alertmanager.mlab-oti.measurementlab.net
firing https://prometheus.mlab-oti.measurementlab.net/graph?g0.expr=collectd_mlab_success+%3D%3D+0&g0.tab=1
Labels:
Annotations:
firing https://prometheus.mlab-oti.measurementlab.net/graph?g0.expr=collectd_mlab_success+%3D%3D+0&g0.tab=1
Labels:
Annotations:
TODO: add graph url from annotations.