prometheus-operator / runbooks

https://runbooks.prometheus-operator.dev
Apache License 2.0
88 stars 164 forks source link

Ideas #29

Open nvtkaszpir opened 2 years ago

nvtkaszpir commented 2 years ago

Loose stream of thoughts.

1.Some runbooks have common debugging patterns, so there is no point in repeating them. Extract as separate sections, maybe directory named /guides/ ? This would be handy with example making a checklist when checking why pod is dead and so on.

  1. If possible link to official kubernetes docs. Officiak k8s are getting better and better, so it is better to direct people there, especially to Tasks section?

  2. Some more in-depth urls could be added to some runbooks. Looking especialy at issues with CpuThrottlingHigh. Not sure if this should be in auto-hidden section under Meaning or maybe add new section at the bottom such as References or Further Reading.

  3. Mitigation section should have Short Term such as 'fix issue now' and Long Term which is actually a post-mortem. Some alerts will require this, because are extremely problematic, especially those with data loss.

  4. add-runbook - link to some known articles https://gitlab.com/gitlab-com/runbooks https://github.com/danluu/post-mortems https://github.com/upgundecha/howtheysre