kubecost / features-bugs

A public repository for filing of Kubecost feature requests and bugs. Please read the issue guidelines before filing an issue here.
0 stars 0 forks source link

Add ability to ignore specific resources #40

Open keithhand opened 1 year ago

keithhand commented 1 year ago

What problem are you trying to solve? Each namespace has a PVC that maps to an Azure Storage Account (via Blob CSI). The Storage is (a) already accounted for in other costing and (b) is not actively using 10TiB of data (it says it has) and therefore should not be priced as such in Kubecost.

Describe the solution you'd like A way to allow users to ignore specific resources through a configuration within Kubecost

Describe alternatives you've considered Using a relabel config to drop the metrics from Prometheus directly

How would users interact with this feature? Configuration through values.yaml, maybe through an ignoredResources section, which can specify a string for resources to ignore. "{namespace}/{objectType}/{objectName}" or "kubecost/persistentvolumeclaim/kubecost-cost-analyzer". Ideally, this includes wildcard support to handle multiple namespaces or objects.

┆Issue is synchronized with this Jira Task by Unito

AjayTripathy commented 1 year ago

Wondering why reconciliation isn't working here. Maybe worth testing to make sure we're doing this correctly?

AjayTripathy commented 1 year ago

This feels important enough to commit to testing/resolving in in v1.100

AjayTripathy commented 1 year ago

@keithhand can we get someone on the support team to set this up and see if we can reproduce it?

keithhand commented 1 year ago

@AjayTripathy, I don't believe reconciliation handles this use case, as the cloud provider is still charging them, so the costs are in the cost report. They are already accounted for within the user's personal accounting, so they are looking for a way to entirely ignore the resource within Kubecost.

vexingly commented 1 year ago

Hi @keithhand, here is the basic use case that can be used to reproduce the issue: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md

Using example kubecost/cost-analyzer-helm-chart#2 from those instructions you can setup multiple PV+PVC's which use the same Azure Blob storage account and each would map to a unique container within that storage account. The PV is defined with the storage capacity that is allocated to the entire storage account so as to not put any artificial limits on the PV (since containers share the storage of the account).

Please let me know if I can provide any more details on this use case and/or if you have any other suggestions, thanks!

keithhand commented 1 year ago

Hey @vexingly, to ensure we have explored all options, cloud reconciliation wouldn't solve the underlying problem, would it? Cloud reconciliation is the process by which we gather the cost report data from Azure and then match them back to the assets reported in Kubecost. As far as I understand, this wouldn't work in your situation as the issue is less of mismatched prices and more the case that you don't want to see those volumes appear in Kubecost entirely; is that correct?

vexingly commented 1 year ago

@keithhand fixing the costs through cloud reconciliation would be a partial mitigation, but I think kubecost would still show that our cluster has 6-7 PiB of disk allocated which isn't accurate.

Ideally these PV/PVC's would be hidden entirely or if they are displayed, they would be aggregated in such a way that the size and cost is not multiplied by the number of PV's. i.e. PV's using the blob csi driver that references the same storage account should be considered as identical resources.

keithhand commented 1 year ago

I recreated this by creating a statefulset and attaching a PVC to that statefulset based on the blob storage CSI. You can access that cluster with the Azure account subscription set and the command:

az aks get-credentials --resource-group khandkcost --name khand-dev-1

We can now see that the new PVC is showing up at the bottom of the allocations report, here (you'll need to port-forward to use the link):

image

I haven't attached any load balancer to access Kubecost directly, but if that would be handy, let me know.

AjayTripathy commented 1 year ago

Thanks @keithhand !! Did you notice an obvious label here to where these PVCs can be identified? I guess also asking @vexingly

keithhand commented 1 year ago

As far as I can tell it is configured as a normal PVC. Labels/annotations are all similar to the default storage class

❯ # Blob storage (to be ignored)
❯ kubectl describe pvc persistent-storage-statefulset-blob-0
Name:          persistent-storage-statefulset-blob-0
Namespace:     kubecost
StorageClass:  blob-fuse
Status:        Bound
Volume:        pvc-702d44f2-d686-44c0-a71c-e7fe6896b94d
Labels:        app=nginx
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-class: blob-fuse
               volume.beta.kubernetes.io/storage-provisioner: blob.csi.azure.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      100Gi
Access Modes:  RWX
VolumeMode:    Filesystem
Used By:       statefulset-blob-0
Events:        <none>

❯ # Default storage (want to see)
❯ kubectl describe pvc kubecost-cost-analyzer
Name:          kubecost-cost-analyzer
Namespace:     kubecost
StorageClass:  default
Status:        Bound
Volume:        pvc-732044fd-2f39-4335-846a-5bfe40a2d4c0
Labels:        app=cost-analyzer
               app.kubernetes.io/instance=kubecost
               app.kubernetes.io/managed-by=Helm
               app.kubernetes.io/name=cost-analyzer
               helm.sh/chart=cost-analyzer-1.98.0
Annotations:   meta.helm.sh/release-name: kubecost
               meta.helm.sh/release-namespace: kubecost
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: disk.csi.azure.com
               volume.kubernetes.io/selected-node: aks-agentpool-41384482-vmss000001
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      32Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       kubecost-cost-analyzer-744c48c4f7-97jnd
Events:        <none>
vexingly commented 1 year ago

From my experience you cannot tell from the PVC itself, you would need to look at the PV spec or the StorageClass spec to identify the storage driver being used. In our case we use the PV spec to define the storage driver to use and the container name to mount, but the actual storage account is contained in a secret.

Example:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-blob
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain  # "Delete" is not supported in static provisioning
  csi:
    driver: blob.csi.azure.com
    readOnly: false
    # make sure volumeid is unique for every identical storage blob container in the cluster
    # character `#` is reserved for internal use and cannot be used in volumehandle
    volumeHandle: unique-volumeid
    volumeAttributes:
      containerName: EXISTING_CONTAINER_NAME
    nodeStageSecretRef:
      name: azure-secret
      namespace: default
teevans commented 1 year ago

@AjayTripathy - Do you have a sense of what we need to move forward here?

AjayTripathy commented 1 year ago

@nikovacevic perhaps a specific ability to delete line items from ETL? How hard you think that would be?

dwbrown2 commented 1 year ago

cc @kwombach12

I'm open to exploring this, but I worry it's a workaround to a Kubernetes/cloud issue as that seems to be the primary motivation for ignoring? Let me know if I'm mistaken.

nikovacevic commented 1 year ago

Yeah, I tend to agree with @dwbrown2 -- we can always hack in a "delete" feature as a workaround, but that feels like a product door we may not want to open? At any rate, if we're just trying to find a short-term workaround, it's possible. Just afraid of the precedent it sets.

AjayTripathy commented 1 year ago

I routinely do this manually, or have support do this manually for users, when stuff goes wrong and people accidently publish a bad metric. It feels general enough to me that we should do something here that allows bad metrics emitted to not explode the entire pipeline for users.

nikovacevic commented 1 year ago

Ok, let's come up with a plan in v1.101 -- deleting ETL data doesn't feel like a post-code freeze quick hack, but I'm open to the idea that some may simply need to do this, and that it's less frustrating than the alternative options.

AjayTripathy commented 1 year ago

Strongly agree with no post code freeze hacks!

jcharcalla commented 1 year ago

+1 on this plan in v1.101 as I'm currently working with a customer to remove erroneous data.

justbert commented 1 year ago

Is this issue still planned to be addressed? :)

kwombach12 commented 1 year ago

@justbert Yes! This is still in our backlog of enhancements

marcopolo97 commented 9 months ago

Hey! I also would love if this issue/feature can be addressed. One of my users reached out to me and said that storage is not being billed yet in one of our environments and that the PV costs that are being generated through kubecost is throwing off their costs. They asked me if there is a way to exclude those PV costs so that they have more accurate costs.

rossfisherkc commented 9 months ago

As commented in the internal Jira issue, let me know if I can be helpful and migrate that one to another board or clean it up a bit for the team