webdevops / azure-devops-exporter

Prometheus exporter for Azure DevOps (VSTS) including agent pools, builds, releases, deployments, pullrequests and repo stats
MIT License
147 stars 57 forks source link

Missing metrics (and incorrect agentPoolId) #53

Closed worldspawn closed 2 years ago

worldspawn commented 2 years ago

I don't seem to be getting the azure_devops_agentpool_queue_length metric.

I only see these metrics image

I can't see any errors in the exporters logs. Is there any guidance on what permissions to enable for the PAT?

worldspawn commented 2 years ago

Also azure_devops_build_latest_info seems to be reporting the wrong agentPoolId. Heres a build

azure_devops_build_latest_info{agentPoolID="13", buildDefinitionID="112", buildID="30597", buildName="Integration Services - Apple App Store", buildNumber="20211208.4", container="azureexporter", endpoint="scrape", instance="10.244.2.4:8080", job="devops-exporter", namespace="pipelines-agents", pod="devops-exporter-69697f4fd5-pfsj9", projectID="redacted", reason="manual", requestedBy="Sam Critchley", result="failed", service="devops-exporter", sourceBranch="refs/heads/develop", sourceVersion="c900631fb8baf641f9a110f2b628142c1036318b", status="completed", url="redacted"}

It says agentPoolID is 13. A quick scan of the agent_pool_info metric...

azure_devops_agentpool_info{agentPoolID="13", agentPoolName="Azure Pipelines", agentPoolType="automation", container="azureexporter", endpoint="scrape", instance="10.244.1.14:8080", isHosted="true", job="devops-exporter", namespace="pipelines-agents", pod="devops-exporter-5bbc64b555-j8g7w", service="devops-exporter"}
azure_devops_agentpool_info{agentPoolID="13", agentPoolName="Azure Pipelines", agentPoolType="automation", container="azureexporter", endpoint="scrape", instance="10.244.2.4:8080", isHosted="true", job="devops-exporter", namespace="pipelines-agents", pod="devops-exporter-69697f4fd5-pfsj9", service="devops-exporter"}

So 13 is "Azure Pipelines". The build number was 20211208.4.

That build ran in the Ubuntu Self Hosted pool (id - 15).

image

Theres more occurrences of this where the pool id should have been 15 but is different values (not just 13).

Update: So I tried the api requests myself and azure devops is indeed just returning bogus results. Awesome

worldspawn commented 2 years ago

Delving further into the incorrect queue id problem I found this bug report: https://developercommunity.visualstudio.com/t/build-list-rest-api-the-field-queuename-and-queuep/1125141

The “queue” field for the build is actually referring to the queue for the pipeline, which is different from the one used by the jobs. This is actually an old field that has no value anymore and we are considering removing it from future API versions. If you need to obtain the queues the jobs used, you can use the following endpoint:
https://dev.azure.com/{organization}/{project}/_apis/build/builds/{buildId}/timeline?api-version=6.0
You can find the jobs queues for those records of type job.

So if I hit https://dev.azure.com/redacted/884a300b-e75a-4f0c-95f2-e750e4ecc5a2/_apis/build/builds/30625/timeline and examine items where type = "Job"

{
            "previousAttempts": [],
            "id": "44dcc42d-f50a-5805-535c-88bb857876bb",
            "parentId": "5217763f-1eaf-54e7-9d2e-df231c02ed73",
            "type": "Job",
            "name": "Socket Hub",
            "startTime": "2021-12-08T11:10:04.7166667Z",
            "finishTime": "2021-12-08T11:11:31.6633333Z",
            "currentOperation": null,
            "percentComplete": null,
            "state": "completed",
            "result": "succeeded",
            "resultCode": null,
            "changeId": 16,
            "lastModified": "0001-01-01T00:00:00",
            "workerName": "pipelines-agents-68997d4746-f5gc2",
            "queueId": 259,
            "order": 1,
            "details": null,
            "errorCount": 0,
            "warningCount": 0,
            "url": null,
            "log": {
                "id": 11,
                "type": "Container",
                "url": "https://dev.azure.com/redacted/884a300b-e75a-4f0c-95f2-e750e4ecc5a2/_apis/build/builds/30625/logs/11"
            },
            "task": null,
            "attempt": 1,
            "identifier": "BuildAndPush.SocketHub.__default"
        },

and get the values of queueId from those entries I can then hit https://dev.azure.com/redacted/884a300b-e75a-4f0c-95f2-e750e4ecc5a2/_apis/distributedtask/queues?queueIds=259

Which contains the magic data 🧙 🍻

{
    "count": 1,
    "value": [
        {
            "id": 259,
            "projectId": "884a300b-e75a-4f0c-95f2-e750e4ecc5a2",
            "name": "Ubuntu Self Hosted",
            "pool": {
                "id": 15,
                "scope": "a0d7cd10-7602-49e0-87b9-37290c2ed7f6",
                "name": "Ubuntu Self Hosted",
                "isHosted": false,
                "poolType": "automation",
                "size": 36,
                "isLegacy": false,
                "options": "none"
            }
        }
    ]
}

and that's how (theres possible a shorter route) you get the agent pool id for the build. My builds only use one pool, I don't know if its possible to have different stages/jobs use different pools.

Sorry for reporting two things in one issue 🤦‍♂️

worldspawn commented 2 years ago

I see my first problem is because I haven't set AZURE_DEVOPS_AGENTPOOL

mblaschke commented 2 years ago

thanks for the analysis, thinking about how to solve that and avoid additional api calls 🤔

mblaschke commented 2 years ago

and that's how (theres possible a shorter route) you get the agent pool id for the build. My builds only use one pool, I don't know if its possible to have different stages/jobs use different pools.

every job block can have an own pool (eg. separating windows and linux jobs)

can you check the result for https://dev.azure.com/{TENANT}/{PROJECT}/_apis/build/builds?api-version=5.1&maxBuildsPerDefinition=1&deletedFilter=excludeDeleted? what is inside the job block?

here is get (for a public testing instance):

{
  "id": 8,
  "name": "Hosted Ubuntu 1604",
  "pool": {
    "id": 8,
    "name": "Hosted Ubuntu 1604",
    "isHosted": true
  }
}
nikhil-neu commented 2 years ago

@worldspawn ...i have the same issue with the missing metrics , do we need to create AZURE_DEVOPS_AGENTPOOL during container creation ? i was under the impression that this was just a filtering requirement. If so , do i give the pool id's comma separated

mblaschke commented 2 years ago

yes you have to set the agentpool which you want to scrape.. the next version will fetch the metrics for all agentpool (even hosted ones) so you would not have to set the env var

a1exstr commented 2 years ago

the next version will fetch the metrics for all agentpool (even hosted ones) so you would not have to set the env var

I would probably prefer having a list of pools or all pools if not defined rather than single pool vs all

mblaschke commented 2 years ago

you can set AZURE_DEVOPS_AGENTPOOL="1 21 42 to set multiple pools (separated by space)

the next version will use all if not set.

Crulex commented 2 years ago

you can set AZURE_DEVOPS_AGENTPOOL="1 21 42 to set multiple pools (separated by space)

the next version will use all if not set.

Hi, is it possible that you will update this? Or we will need to put all pools id?

mblaschke commented 2 years ago

22.7.0 released