mvisonneau / gitlab-ci-pipelines-exporter

Prometheus / OpenMetrics exporter for GitLab CI pipelines insights
Apache License 2.0
1.21k stars 239 forks source link

Multiple job status are exported at the same time for one job #814

Open cristina-defran opened 3 months ago

cristina-defran commented 3 months ago

If I query the gitlab_ci_pipeline_job_status metric with a specific project, ref, source and job_name, I get multiple metrics with different status and/or failure_reason labels (in case of failure). This makes it impossible to know which one is the last status for the job.

For example, using this query:

gitlab_ci_pipeline_job_status{job=~"gitlab", project=~"project1", ref=~"branch1", source=~"web", job_name=~"job1"}

I get these metrics, all with value 1:

gitlab_ci_pipeline_job_status{instance="instance1", job="gitlab", job_name="job1", kind="branch", project="project1", ref="branch1", source="web", stage="stage1", status="pending", tag_list="tag1"}

gitlab_ci_pipeline_job_status{failure_reason="stuck_or_timeout_failure", instance="instance1", job="gitlab", job_name="job1", kind="branch", project="project1", ref="branch1", source="web", stage="stage1", status="failed", tag_list="tag1"}

gitlab_ci_pipeline_job_status{failure_reason="script_failure", instance="instance1", job="gitlab", job_name="job1", kind="branch", project="project1", ref="branch1", runner_description="knt-builder", source="web", stage="stage1", status="failed", tag_list="tag1"}

Only the last status should be exported.

lang-m commented 1 month ago

I ran into the same problem. I think the first two metrics are actually for older jobs (the newest one without failure reason and the newest one with failure_reason="stuck_or_timeout_failure", respectively). Generally, we cannot know which one is the newest one unless we compare job IDs.

@cristina-defran Could you confirm this by checking the IDs in

gitlab_ci_pipeline_job_id{job=~"gitlab", project=~"project1", ref=~"branch1", source=~"web", job_name=~"job1"}

Edit: It is actually not clear what ID the first metric will belong to. In general there are three options:

  1. all three metrics belong to different IDs
  2. job_status metric 1 and 2 (no failure_reason and failure_reason="stuck_or_timeout_failure") have the same ID
  3. job_status metric 1 and 3 (no failure_reason and failure_reason="script_failure") have the same ID

Based on @cristina-defran issue we can guess that option 2 will not be relevant it this particular case because we "know" that metric 2 (stuck_or_timeout_failure) belongs to an older job than metric 3 (script_failure) [and metric 1 (no failure_reason) always belongs to the newest job].

cristina-defran commented 1 month ago

I'm afraid I no longer have the data for those results, so I cannot check the respective job IDs for the 3 results I gave. With the data I currently have, I can only see 2 results, a failed one (failure_reason="script_failure") and a successful one, both with different job IDs, so checking the job IDs for the latest one could work. This should be done by the exporter, and only export the latest one (the one with the highest ID value).

With the exporter exporting the status for both jobs, the Grafana visualization does not show the proper information, and I haven't found a way to apply the highest job ID-filtering to the job status table.