operate-first / hitchhikers-guide

Hitchhiker's guide to Operate-first
GNU General Public License v3.0
3 stars 8 forks source link

Decide on a demo application that will be used in the SRE demo #86

Closed 4n4nd closed 3 years ago

tumido commented 3 years ago

Demo:

  1. Web service with a button
  2. On click the webservice inflates memory usage and gets OOMKilled
  3. Have a PodMonotor tracking this
  4. Alert on crashloop and/or pod crashing
  5. Up the memory limit
  6. Observe the service to recover
4n4nd commented 3 years ago

On click the webservice inflates memory usage and gets OOMKilled

@tumido for the pod crash, I think we should change that to PVCs being full and the app running out of storage. If an application gets OOMkilled, it just restarts with clear memory so we might not be able to send alerts for it.

4n4nd commented 3 years ago

@Gregory-Pereira made some progress here: https://github.com/Gregory-Pereira/SRE-Demo-App

HumairAK commented 3 years ago

Alert for the app needs to either link a runbook, or we need to at the least have a runbook to resolve the alert. So if it's a full pvc, we should have a runbook that instructs on how to resize the pvc to have more storage.

4n4nd commented 3 years ago

We have finally decided that we will use JH PVC full as the demo

tumido commented 3 years ago

Solved.