rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.58k stars 270 forks source link

Expose prometheus metrics for etcd snapshots #3110

Open mitchellmaler opened 2 years ago

mitchellmaler commented 2 years ago

Is your feature request related to a problem? Please describe. Right now if a ETCD backup fails for some reason it only outputs a log entry.

Describe the solution you'd like Would like rke2 to output some metrics on the status of the etcd backups.

Describe alternatives you've considered

Additional context

nickvth commented 1 year ago

+1

avthart commented 10 months ago

Would be great to have prometheus metrics available for RKE2 so that we can have monitoring and alerting when etcd backups are not working anymore.

brandond commented 1 month ago

Could consider a single metric for last snapshot time, with labels for status=successful/failed, location=local/s3, name=\<basename>. The name can be used to distinguish between scheduled and on-demand snapshots since they have different basenames.

Note that this will need to actually be implemented on the k3s side.