mvisonneau / gitlab-ci-pipelines-exporter

Prometheus / OpenMetrics exporter for GitLab CI pipelines insights
Apache License 2.0
1.21k stars 238 forks source link

Child Pipelines data missing #495

Open umdstu opened 1 year ago

umdstu commented 1 year ago

I recently discovered that child-pipeline records are missing from the exporter. This is because both avenues for getting data, either pulling via the GitLab API, or pushed via a webhook, do not include child-pipeline records. This makes direct access for the exporter challenging.

It appears as though the existing functionality of include_pipeline_jobs only applies to jobs, not pipelines themselves. I think you have an option for both polling and webhooks, which would required followup calls for both.

Would it be possible for the exporter to follow up passive webhook events received with active calls to the List pipeline bridges endpoint?

As for polling, you'd have to make even more noise and call the bridges endpoint for every single pipeline you polled in that time frame.

Thanks!

cebidhem commented 1 year ago

Hi,

I think there are 2 different things when it comes to child pipelines:

  1. Metrics of the child pipeline itself (duration, status, etc..)
  2. Link with the parent pipeline, leading to pipeline_duration = parent_pipeline_duration + child_pipeline_duration or pipeline_success == true when parent_pipeline_success == true && child_pipeline_success == true

Item 2 is eased by the directive strategy: depends in the gitlab-ci.yaml.

For 1, indeed we don't have child pipelines metrics at all. I have an example for which, in total, 6 webhooks have been sent to the exporter (redacted for brevety mostly):

  1. parent pending
    "object_attributes": {
    "id": 602680503,
    "source": "web",
    "status": "pending",
    "detailed_status": "pending",
    "stages": [
      "build",
      "trigger"
    ],
  2. parent running
    "object_attributes": {
    "id": 602680503,
    "source": "web",
    "status": "running",
    "detailed_status": "running",
    "stages": [
      "build",
      "trigger"
    ],
  3. child pending
    "object_attributes": {
    "id": 602680818,
    "source": "parent_pipeline",
    "status": "pending",
    "detailed_status": "pending",
    "stages": [
      "plan",
      "apply"
    ],
  4. child running
    "object_attributes": {
    "id": 602680818,
    "source": "parent_pipeline",
    "status": "running",
    "detailed_status": "running",
    "stages": [
      "plan",
      "apply"
    ],
  5. child passed
    "object_attributes": {
    "id": 602680818,
    "source": "parent_pipeline",
    "status": "success",
    "detailed_status": "passed",
    "stages": [
      "plan",
      "apply"
    ],
    "created_at": "2022-08-25 10:20:57 UTC",
    "finished_at": "2022-08-25 10:24:18 UTC",
    "duration": 187,
    "queued_duration": 14,
    },
  6. parent passed
    "object_attributes": {
    "id": 602680503,
    "source": "web",
    "status": "success",
    "detailed_status": "passed",
    "stages": [
      "build",
      "trigger"
    ],
    "created_at": "2022-08-25 10:20:34 UTC",
    "finished_at": "2022-08-25 10:24:19 UTC",
    "duration": 13,
    "queued_duration": 7,
    },

In the exporter for this, I only see parent metrics for the sum of both jobs durations (13 seconds) instead of the duration I see in the final webhook (3m45s).

For this sole use-case, I see 2 issues:

  1. child pipeline metrics is not retrieve at all.
  2. parent pipeline metrics are incorrect.

I don't know if this can be helpful to try to implement a fix, but I wanted to describe this with a concrete example. since most of our pipelines are generated or triggered, we are extensively using parent-child pipelines.