Closed measurementlab closed 6 years ago
This is the first time that this event has fired. What has happened is that the PersistentVolumeClaim has entered status "lost", which I didn't know could happen.
So because the pod can't access its persistent storage, its startup fails.
To fix this, I need to fix the status of the persistent volume claim. The only way I know how to do this is by deleting it and making a new one, but I am currently investigating.
This alert is a leading indicator that data is not getting downloaded. If you don't fix it for a few days, then you will start getting ScraperMostRecentArchivedFileTimeIsTooOld alerts for the same pod.
I deleted the PersistentVolumeClaim and then recreated it. This is safe because no data is ever deleted from the MLab platform server until it has been uploaded off of the scraper instance's storage, which means that the PersistentVolumeClaim is just a cache, rather than the last authoritative copy of any data.
Alertmanager URL: https://alertmanager.mlab-oti.measurementlab.net
firing https://prometheus.mlab-oti.measurementlab.net/graph?g0.expr=%28scraper_lastcollectionattempt%7Bcontainer%3D%22scraper-sync%22%7D+unless+ON%28machine%2C+experiment%2C+rsync_module%29+up%7Bcontainer%3D%22scraper%22%7D%29&g0.tab=1
Labels:
Annotations:
TODO: add graph url from annotations.