grafana / cloudcost-exporter

Prometheus Exporter for Cloud Provider agnostic cost metrics
Apache License 2.0
16 stars 0 forks source link

Implement persistent volumes in AWS #235

Closed Pokom closed 3 weeks ago

Pokom commented 2 months ago

The primary goal is to export a metric that calculates the hourly cost of each persistent volume within our EKS clusters. The exported metric should align with our existing metrics(cloudcost_aws_eks_persistent_volume_dollars_per_hour).

There's really two parts:

  1. Generate a Pricing Map that enables searching by region and storage class
  2. List out EBS volumes

There should be the following labels:

@paulajulve had pointed out the cluster label may not be possible to derive from the API response. If that's the case, then we need to consider how we can join against existing kube state metrics to derive the cluster name.

Pokom commented 1 month ago

There's two of approaches that can I believe can be taken:

  1. Create a module dedicated to exported costs of disks
  2. Add a section to the eks module that exports the costs of persistent volumes alongside

With GKE, the costs of disks was returned alongside the costs of compute resources(https://github.com/grafana/cloudcost-exporter/blob/main/pkg/google/compute/pricing_map.go#L218-L246), so I decided to implement exporting the costs of pv's alongside the cost of instances:

  1. list disks
  2. export costs of persistent volumes

What's hard for me to know without digging into the API responses is if there is a similar level of coupling for AWS. My hunch is there isn't since we use the following filter for the listing of prices.

Personal recommendation: check to see if the pricing map from eks(and the listing of prices) can easily be extended to pull in disk costs. If not, I'd recommend going down the route of creating a module dedicated to disks. Even though PV's is somewhat tightly coupled to k8s, we're ultimately billed for disks. I think it would be cleaner to have a disk module that can be extended to support non k8s disks, then the other way around.

Pokom commented 1 month ago

Most of the functionality is implemented with tweaks to naming and what not. The main thing is adding tests, specifically for the Collect method, is tightly coupled to the compute module.

@Pokom will take a look to see if we can split out the collect method in such a way where you can create a method that encapsulates the logic into a testable method.

Pokom commented 1 month ago

Data is out in prod and next step is validating it and then creating the TCO rules.

Pokom commented 3 weeks ago

Currently working to offload processing of the volumes to a background goroutine for performance reasons. @paulajulve will close this out and follow up with another issue that details the performance problems and track the work there.

paulajulve commented 3 weeks ago

Next steps: