Open ghost opened 6 years ago
Symptom is similar to https://github.com/healthlocker/healthlocker/issues/1101 and could be why there were issues logging in to the server in https://github.com/healthlocker/healthlocker/issues/1152 (as previously it was not possible to SSH on to the server when it was at very high CPU load).
@RobStallion when you speak to Matt, can you mention high CPU load in case this is what is causing the continued problems?
@RobStallion could this high CPU load have been anything to do with the load balancer issues?
@katyedwards I believe that it is to do with the load balancer. I would need Matt to confirm but I think that is a fair assumption as the high cpu load started on the day that the servers went down.
@katyedwards Is this not a Healthlocker issue, not Oxleas?
It would be great to see if Matt had any further information on this particular high CPU incident. I'm not sure that we know for certain that it was as a result of the load balancer issues since the server profile graphs that Matt sent over appear to be from the application server, not the load balancer. We've also seen previous issues with high CPU load which we were not able to explore further at the time.
The image at https://user-images.githubusercontent.com/24604903/35397197-bdbfa18c-01e6-11e8-8afe-15add2291726.png suggests this is the Healthlocker production server, not Oxleas. In which case, I'd suggest possibly even more so that this is not a load balancer issue, as the previous high CPU issues have been with the same server, see the server name at https://user-images.githubusercontent.com/151362/30856800-6b0e91e4-a2b1-11e7-8c79-c181c44e2b62.png in issue https://github.com/healthlocker/healthlocker/issues/1101
@reddog it was noticed first on oxleas when we asked them to restart the server to test the script hence the issue was opened here. After this they then came back with other screenshots of the other servers.
I have asked the question of SLaM, but no response yet.
Matt checked the CPU load on Oxleas Prod in case it had something to do with the VM becoming unresponsive. He noted the above that at 6am daily the load is shooting up.
Any ideas what it running daily and might be causing this?
Additional note:
Tried to get more info but no concern from SLaM - will move this to the backlog so we can review the CPU usage next time we are working on the project to see if all has returned to normal.
Highlighted by SLaM
Oxleas-HL-Test:
HL-Prod
Not sure why!