mvisonneau / gitlab-ci-pipelines-exporter

Prometheus / OpenMetrics exporter for GitLab CI pipelines insights
Apache License 2.0
1.27k stars 240 forks source link

Cumulative metrics for job/pipeline durations #778

Open slsyy opened 8 months ago

slsyy commented 8 months ago

Right now we have only gitlab_ci_pipeline_job_duration_seconds or similar metrics, which cannot be used to answer questions like this:

, because most of metrics returns data only for the latest job/pipeline.

Do you think that adding metrics such as histogram of durations will be consistent with the idea of the project and bring some added value?

wnobres-sr commented 8 months ago

I'm looking for the same possibility actually so we can properly track the impact of improvements or identify decrease of performance of the jobs overtime. Is there a way to achieve that using the existing implementation or would require development effort?

slsyy commented 8 months ago

Is there a way to achieve that using the existing implementation or would require development effort?

I don't think so. Current metrics keep only status for the last run. I will try to implement this kind of metrics to see, if it makes sense. Please inform me, if you are working on something similar or you have an alternative solution

cgill27 commented 4 months ago

Agree this would be good data to either collect or some way to pull it together in Grafana. Would love to be able to set thresholds and alerts in Grafana to notify when there's an obvious problem across all pipelines/jobs etc.

slsyy commented 2 days ago

FYI: I developed a working solution for my company, which is running for almost a half a year. It is a made from scratch, because it didn't make sense to improve/rework gitlab-ci-pipelines-exporter due to the size of the codebase and architectural restrictions

We emit those metrics:

const (
    namespace = "gitlab_ci"

    jobSubsystem      = "job"
    pipelineSubsystem = "pipeline"

    labelJobName       = "job_name"
    labelJobRunner     = "job_runner"
    labelPipelineStage = "pipeline_stage"
    labelProject       = "project"
    labelStatus        = "status"
)

var latencyHistogramBuckets = []float64{.1, .25, .5, 1, 2.5, 5, 10, 15, 20, 30, 40, 50, 60, 90, 150, 210, 270, 330, 390, 450, 500, 600, 1200, 1800, 2700, 3600}

func New() *Metrics {
    return &Metrics{
        jobDurationHistogram: prometheus.NewHistogramVec(
            prometheus.HistogramOpts{
                Namespace: namespace,
                Subsystem: jobSubsystem,
                Name:      "duration_seconds",
                Help:      "Histogram of duration (seconds) of finished gitlab jobs",
                Buckets:   latencyHistogramBuckets,
            },
            []string{labelJobName, labelJobRunner, labelStatus, labelProject, labelPipelineStage},
        ),
        jobQueuedHistogram: prometheus.NewHistogramVec(
            prometheus.HistogramOpts{
                Namespace: namespace,
                Subsystem: jobSubsystem,
                Name:      "queued_seconds",
                Help:      "Histogram of duration (seconds) of waiting time in a queue",
                Buckets:   latencyHistogramBuckets,
            },
            []string{labelProject},
        ),

        pipelineDurationHistogram: prometheus.NewHistogramVec(
            prometheus.HistogramOpts{
                Namespace: namespace,
                Subsystem: pipelineSubsystem,
                Name:      "duration_seconds",
                Help:      "Histogram of duration (seconds) of finished gitlab pipelines",
                Buckets:   latencyHistogramBuckets,
            },
            []string{labelProject, labelStatus},
        ),
        pipelineQueuedHistogram: prometheus.NewHistogramVec(
            prometheus.HistogramOpts{
                Namespace: namespace,
                Subsystem: pipelineSubsystem,
                Name:      "queued_seconds",
                Help:      "Histogram of duration (seconds) of waiting time in a queue",
                Buckets:   latencyHistogramBuckets,
            },
            []string{labelProject},
        ),
    }
}

, which are consumed by this Grafana Dasboard:

image

Is anyone willing to use it? I just wonder, if it make sense to publish this to open source community