Closed tumido closed 1 year ago
Example of SLOs for Google Dataplex.
There are many more examples of Google SLAs (SLOs) here.
I took a look at AppSRE for their SLOs:
Internal resources: https://source.redhat.com/groups/public/sre-services/sre_services_wiki/appsre_slos https://service.pages.redhat.com/dev-guidelines/docs/appsre/onboarding/creating-slos/
Schema: https://github.com/app-sre/qontract-schemas/blob/main/schemas/app-sre/slo-document-1.yml
Availability/Usage metrics:
Red Hat OpenShift Streams for Apache Kafka Service Def: https://access.redhat.com/articles/6473891 General Service Terms and Conditions: https://www.redhat.com/licenses/Appendix_4_Red_Hat_Online_Services_20211021.pdf
Availability
Support See this table for details: https://access.redhat.com/support/offerings/production/sla
Performance
Kafka specific limits See limits here: https://access.redhat.com/articles/5979061
There are more limits but they are specific to a Kafka cluster and not general to a service
The Thoth team has WIP/draft documentation on defining SLOs for the service.
There are still no final details on that, but here is a list of SLIs that are listed as the focus:
The quick summary is that the overall focus is in 2 aspects: response time (latency) and service coverage (quality)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Until next SIG Services on, pick a RH managed product (or any other managed service) and get a list of their SLOs so we can compare and derive an ultimate list of SLOs that can be used by service owners as an inspiration for their own SLOs.
If you pick a service, open a comment in here stating the service name (so we don't end up with multiple people working on the same service).