grafana / cloudcost-exporter

Prometheus Exporter for Cloud Provider agnostic cost metrics
Apache License 2.0
30 stars 1 forks source link

refactor(collectors): Refactor Collect interface to return a float #63

Closed Pokom closed 9 months ago

Pokom commented 9 months ago

As part of #20, this refactors the Collector.Collect interface to return a float64 instead of an error. This enables the following refactors:

A majority of the changes are initially for tests.

Pokom commented 9 months ago

@skl

So the code change looks fine but I don't understand the benefit of changing the return type from error to float64, I can guess it may be one or more of these things

It's more so: to accumulate some kind of counter. I've seen this pattern in a few other exporters where you return a float64 from your collect|scrape method and then set an up gauge to the returned value. There's nothing stopping us from also returning an error, but I don't know if I see the value since we won't act upon it in the calling method. I think we'd either handle it in the Collector.Collect method, or log it out there.

Here's a rough scratch of what I had in mind:

type Collector interface {
    Register(r Registry) error
    Collect(ch chan<- prometheus.Metric) float64
    Name() string
}

type Provider interface {
    RegisterCollectors(r Registry) error
    Collect(ch chan<- prometheus.Metric) error
}

Then from there the Collect method would look a bit closer to this:

var (
    collectorUpDesc         = prometheus.NewDesc(prometheus.BuildFQName("cloudcost", "aws", "collector_up"), "Whether a collector was successful.", []string{"collector"}, nil)
    collectorScrapeDuration = prometheus.NewDesc(prometheus.BuildFQName("cloudcost", "aws", "collector_scrape_duration_seconds"), "Time it took for a collector to scrape metrics.", []string{"collector"}, nil)
)

func (a *AWS) Collect(ch chan<- prometheus.Metric) {
    log.Printf("Collecting metrics for %d collectors for AWS", len(a.collectors))
    wg := sync.WaitGroup{}
    for _, c := range a.collectors {
        now := time.Now()
        wg.Add(1)
        func(c provider.Collector) {
            defer wg.Done()
            up := c.Collect(ch)
            if up == 0 {
                log.Printf("Error collecting metrics for %s", c.Name())
            }
            ch <- prometheus.MustNewConstMetric(collectorUpDesc, prometheus.GaugeValue, up, c.Name())
            ch <- prometheus.MustNewConstMetric(collectorScrapeDuration, prometheus.GaugeValue, time.Since(now).Seconds(), c.Name())
        }(c)
    }
    wg.Wait()
}

This will make adding collectors somewhat easier as they only need to focus on implementing collect and exporting cost metrics. The primary provider becomes responsible for tracking and emitting operating metrics 😎