netlify / gotrue

An SWT based API for managing users and issuing SWT tokens.
https://www.gotrueapi.org
MIT License
3.78k stars 279 forks source link

Observability & Alerting #354

Closed lexicondevil closed 1 year ago

lexicondevil commented 1 year ago

Note: This is issue is part of the Service Transfer Project. The goal is to ensure project documentation is up to date and help the receiving team understand what the service does and how to maintain and operate it. The previous team is primarily responsible for doing this work, and the receiving team is the stakeholder on this issue and has final approval.

These are a set of guidelines, not a rigid set of requirements. If the receiving team already has expertise on this service and is comfortable operating it, they may complete whatever subset of the tasks they find appropriate and close this issue.

The assignees on this issue are intended to be "manager of previous team" and "manager of new team" based on what's in the Service Ownership Spreadsheet. If these are incorrect please update the assignees on this issue and update the spreadsheet to match.

Observability & Alerting

Share your approach to observability and alerting with the receiving team. Link to the configured monitors and describe what they do and what it means if they fire. Perhaps schedule a meeting with the receiving team to present and discuss this information.

The standards we suggest during the Production Readiness process are here for reference:

Ensure that the service is set up with appropriate monitoring and alerting mechanisms to detect and respond to issues in a timely manner.

For monitors that go to a pager, you want to optimize for only waking someone up when human intervention is required. Try to avoid paging for noisy signals that may be false positives.

Further reading: https://sre.google/sre-book/monitoring-distributed-systems/

rybit commented 1 year ago

This is an public repo. I'm going to close this for now.