osmosis-labs / osmosis

The AMM Laboratory
https://app.osmosis.zone
Apache License 2.0
875 stars 563 forks source link

Incident Response Playbook / Toolkit #2951

Open czarcas7ic opened 1 year ago

czarcas7ic commented 1 year ago

Background

In the even of an incident, we should have a playbook of general actions that should be taken to mitigate problems and come to an eventual solution.

In addition to this, a "toolkit" should be created that make diagnosing issues across system simpler. An example of this would be some tool that could be run on any system that can extract basic information such as module hashes / s3cmd config pre configured to upload snapshots to our DO s3 buckets / etc.

Suggested Design

Acceptance Criteria

czarcas7ic commented 1 year ago

Another part of this toolkit could be a google sheet that gets automatically generated with validator addresses / power. This is currently manually generated during every incident.

ValarDragon commented 1 year ago

I think we should: