grafana / cloudcost-exporter

Prometheus Exporter for Cloud Provider agnostic cost metrics
Apache License 2.0
30 stars 1 forks source link

AKS VM List - Slim Version #221

Closed logyball closed 3 months ago

logyball commented 3 months ago

AKS Virtual Machine Pricing Metric Population

This PR implements a working version of an incomplete metric population for Azure AKS. See the below "Todo" section for follow-up work.

A new method for obtain VM information is introduced.

[!NOTE] This was born out of this much more complicated PR, but I've removed the caching and region-awareness elements. We are yet to decide if they should return.

Virtual Machines

A new subsystem of the aks package is introduced: machine_store. This is analogous to the already implemented price_store. Rather than pricing information for certain machine types and characteristics, machine_store is responsible for:

Population Strategy

Machines are populated in the following way:

This information is refreshed periodically, currently every five minutes.

Putting it Together

Now that it is established:

We can combine to generate prometheus metrics.

When the AKS collector is first registered, both the PriceMap and the MachineMap are populated. This is an expensive API call and is thus only done once every 24 hours.

After both PriceMap and MachineMap are populated, calls to Collect() (happening when the Prometheus /metrics endpoint is scraped) follow this pattern:

Todos

Testing

❯ ./cloudcost-exporter --provider azure --server.address "127.0.0.1:8080" --azure.services "aks" --azure.subscription-id "<redacted>" --log.level info 
time=2024-07-01T17:43:44.480-04:00 level=INFO msg="Starting cloudcost-exporter" version="(version=v0.1.5-13-gbaa2ccf, branch=logyball/aks-vm-list-implementation-slim, revision=baa2ccf)" build_context="(go=go1.22.0, platform=darwin/arm64, user=loganballard@Logans-Laptop.local, date=2024-07-01T21:36:13Z, tags=unknown)"
time=2024-07-01T17:43:44.481-04:00 level=INFO msg="populating price store" provider=azure collector=aks subsystem=priceStore
time=2024-07-01T17:43:44.481-04:00 level=INFO msg="populating machine store" provider=azure collector=aks subsystem=machineStore
time=2024-07-01T17:43:44.481-04:00 level=INFO msg="TODO - implement AKS collector Describe method" provider=azure collector=aks
time=2024-07-01T17:43:44.481-04:00 level=INFO msg="registering collectors" provider=azure NumOfCollectors=1
time=2024-07-01T17:43:44.481-04:00 level=INFO msg="registering collector" provider=azure collector=aks
time=2024-07-01T17:43:44.481-04:00 level=INFO msg="Starting server" address=127.0.0.1:8080 path=/metrics
time=2024-07-01T17:43:52.877-04:00 level=INFO msg="region name for price not found" provider=azure collector=aks subsystem=priceStore sku=Standard
time=2024-07-01T17:43:58.865-04:00 level=INFO msg="region name for price not found" provider=azure collector=aks subsystem=priceStore sku=Standard
time=2024-07-01T17:43:59.377-04:00 level=INFO msg="machine store populated" provider=azure collector=aks subsystem=machineStore duration=14.89676275s
time=2024-07-01T17:44:10.365-04:00 level=INFO msg="collecting metrics" provider=azure collector=aks
time=2024-07-01T17:44:30.660-04:00 level=INFO msg="price store populated" provider=azure collector=aks subsystem=priceStore duration=46.178977917s
time=2024-07-01T17:44:30.660-04:00 level=INFO msg="metrics collected" provider=azure collector=aks duration=20.295563333s

(separate terminal window)

❯ curl localhost:8080/metrics | grep -v "go_" | prom2json | jq '.[] | select(.name == "cloudcost_azure_aks_instance_total_usd_per_hour") | .metrics[] | [.labels.machine_type, .labels.region, .value]'

[
  "Standard_D4s_v3",
  "westeurope",
  "0.24"
] 

... a bunch of similar ones ...

[
  "Standard_D16s_v3",
  "centralus",
  "0.088"
]

note: removing cluster name and other sensitive items.

Quick price spot check