We encountered some downtime a few weeks ago and one observation was that although AWS EC2 instances auto-restarted, the full stack of PCM services on machines like GRQ did not. This led to a 10h+ downtime until personnel detected the issue.
Describe the feature request
We should ensure all PCM services that are essential for daily operations automatically restart upon and VM reboot or process exit (up to a maximum number of times).
Checked for duplicates
Yes - I've already checked
Alternatives considered
Yes - and alternatives don't suffice
Related problems
We encountered some downtime a few weeks ago and one observation was that although AWS EC2 instances auto-restarted, the full stack of PCM services on machines like GRQ did not. This led to a 10h+ downtime until personnel detected the issue.
Describe the feature request
We should ensure all PCM services that are essential for daily operations automatically restart upon and VM reboot or process exit (up to a maximum number of times).