Open mitchellmaler opened 2 years ago
+1
Would be great to have prometheus metrics available for RKE2 so that we can have monitoring and alerting when etcd backups are not working anymore.
Could consider a single metric for last snapshot time, with labels for status=successful/failed, location=local/s3, name=\<basename>. The name can be used to distinguish between scheduled and on-demand snapshots since they have different basenames.
Note that this will need to actually be implemented on the k3s side.
Is your feature request related to a problem? Please describe. Right now if a ETCD backup fails for some reason it only outputs a log entry.
Describe the solution you'd like Would like rke2 to output some metrics on the status of the etcd backups.
Describe alternatives you've considered
Additional context