Open kyphutruong opened 1 month ago
I think we can use the query with string InsufficientCidrBlocks
as a criteria for the alert.
https://moj.enterprise.slack.com/archives/C05RE26R8TW/p1695299579240409
https://moj.enterprise.slack.com/archives/C05RE26R8TW/p1695289492516869
Background
We now have vpc-cni/ipamd logs going to [opensearch](https://app-logs.cloud-platform.service.justice.gov.uk/_dashboards/app/data-explorer/discover#?_a=(discover:(columns:!(_source),isDirty:!f,sort:!()),metadata:(indexPattern:'6d5c66d0-8d35-11ef-a6ba-a7191f5fb1c2',view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-15m,to:now))&_q=(filters:!(),query:(language:kuery,query:'')))
Set up alerts for when a node has CNI errors for prefix allocation and/or low/no IP prefixes available to live cluster
Proposed user journey
Alerts going to low-priority channel when low/no IP prefixes available to live cluster
Approach
Which part of the user docs does this impact
Communicate changes
Questions / Assumptions
Definition of done
Reference
How to write good user stories