grafana / cloudcost-exporter

Prometheus Exporter for Cloud Provider agnostic cost metrics
Apache License 2.0
66 stars 4 forks source link

GKE/Compute performance issues when instance list >3000 #99

Closed Pokom closed 9 months ago

Pokom commented 9 months ago

In projects with >1000 compute instances, the time to list them all can take >1 minute. For perspective, time gcloud compute instances list --format="table[no-heading](name)" against the same project takes upwards of 1.1 minute.

This is problematic when we're scraping every 1m because the next scrape will start before the current one finishes and lead to a scenario where we have uneven data. We need to come up with a strategy to list the instances in such a way where we can make a number of ListInstance requests in parallel. One potential way we can do this is to list the instances by region/zone and run those requests in paralle.