Overview

We're hitting a scaling point within Grafana Labs where the time to collect metrics for GKE instances is >40s. The current configuration is setup to scrape metrics every 60 seconds. This aligns with our TCO recording rules that calculate our TCO every minute. We'd like to avoid bumping the scrape duration above 60 seconds so that our TCO recording rules are as fresh as possible.

Here's a 90 day view of our scrape duration increased over that time:

Goals

[ ] collect disk metadata in a background process and store results in a cache
[ ] collect pricing metadata in a background process and store results in a cache
[ ] Scraping /metrics should be as fast as reading from the cache

@logyball already provided a nice framework to consider this work in his implementation of AKS metrics.

grafana / cloudcost-exporter

GCP GKE: Collect node metrics in the background #272

Overview

Goals