rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.56k stars 268 forks source link

Expose prometheus metrics for etcd snapshots #3110

Open mitchellmaler opened 2 years ago

mitchellmaler commented 2 years ago

Is your feature request related to a problem? Please describe. Right now if a ETCD backup fails for some reason it only outputs a log entry.

Describe the solution you'd like Would like rke2 to output some metrics on the status of the etcd backups.

Describe alternatives you've considered

Additional context

nickvth commented 1 year ago

+1

avthart commented 8 months ago

Would be great to have prometheus metrics available for RKE2 so that we can have monitoring and alerting when etcd backups are not working anymore.

brandond commented 2 days ago

Could consider a single metric for last snapshot time, with labels for status=successful/failed, location=local/s3, name=\<basename>. The name can be used to distinguish between scheduled and on-demand snapshots since they have different basenames.

Note that this will need to actually be implemented on the k3s side.