Open keithhand opened 1 year ago
Wondering why reconciliation isn't working here. Maybe worth testing to make sure we're doing this correctly?
This feels important enough to commit to testing/resolving in in v1.100
@keithhand can we get someone on the support team to set this up and see if we can reproduce it?
@AjayTripathy, I don't believe reconciliation handles this use case, as the cloud provider is still charging them, so the costs are in the cost report. They are already accounted for within the user's personal accounting, so they are looking for a way to entirely ignore the resource within Kubecost.
Hi @keithhand, here is the basic use case that can be used to reproduce the issue: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md
Using example kubecost/cost-analyzer-helm-chart#2 from those instructions you can setup multiple PV+PVC's which use the same Azure Blob storage account and each would map to a unique container within that storage account. The PV is defined with the storage capacity that is allocated to the entire storage account so as to not put any artificial limits on the PV (since containers share the storage of the account).
Please let me know if I can provide any more details on this use case and/or if you have any other suggestions, thanks!
Hey @vexingly, to ensure we have explored all options, cloud reconciliation wouldn't solve the underlying problem, would it? Cloud reconciliation is the process by which we gather the cost report data from Azure and then match them back to the assets reported in Kubecost. As far as I understand, this wouldn't work in your situation as the issue is less of mismatched prices and more the case that you don't want to see those volumes appear in Kubecost entirely; is that correct?
@keithhand fixing the costs through cloud reconciliation would be a partial mitigation, but I think kubecost would still show that our cluster has 6-7 PiB of disk allocated which isn't accurate.
Ideally these PV/PVC's would be hidden entirely or if they are displayed, they would be aggregated in such a way that the size and cost is not multiplied by the number of PV's. i.e. PV's using the blob csi driver that references the same storage account should be considered as identical resources.
I recreated this by creating a statefulset and attaching a PVC to that statefulset based on the blob storage CSI. You can access that cluster with the Azure account subscription set and the command:
az aks get-credentials --resource-group khandkcost --name khand-dev-1
We can now see that the new PVC is showing up at the bottom of the allocations report, here (you'll need to port-forward to use the link):
I haven't attached any load balancer to access Kubecost directly, but if that would be handy, let me know.
Thanks @keithhand !! Did you notice an obvious label here to where these PVCs can be identified? I guess also asking @vexingly
As far as I can tell it is configured as a normal PVC. Labels/annotations are all similar to the default storage class
❯ # Blob storage (to be ignored)
❯ kubectl describe pvc persistent-storage-statefulset-blob-0
Name: persistent-storage-statefulset-blob-0
Namespace: kubecost
StorageClass: blob-fuse
Status: Bound
Volume: pvc-702d44f2-d686-44c0-a71c-e7fe6896b94d
Labels: app=nginx
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-class: blob-fuse
volume.beta.kubernetes.io/storage-provisioner: blob.csi.azure.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 100Gi
Access Modes: RWX
VolumeMode: Filesystem
Used By: statefulset-blob-0
Events: <none>
❯ # Default storage (want to see)
❯ kubectl describe pvc kubecost-cost-analyzer
Name: kubecost-cost-analyzer
Namespace: kubecost
StorageClass: default
Status: Bound
Volume: pvc-732044fd-2f39-4335-846a-5bfe40a2d4c0
Labels: app=cost-analyzer
app.kubernetes.io/instance=kubecost
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cost-analyzer
helm.sh/chart=cost-analyzer-1.98.0
Annotations: meta.helm.sh/release-name: kubecost
meta.helm.sh/release-namespace: kubecost
pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: disk.csi.azure.com
volume.kubernetes.io/selected-node: aks-agentpool-41384482-vmss000001
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 32Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: kubecost-cost-analyzer-744c48c4f7-97jnd
Events: <none>
From my experience you cannot tell from the PVC itself, you would need to look at the PV spec or the StorageClass spec to identify the storage driver being used. In our case we use the PV spec to define the storage driver to use and the container name to mount, but the actual storage account is contained in a secret.
Example:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-blob
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain # "Delete" is not supported in static provisioning
csi:
driver: blob.csi.azure.com
readOnly: false
# make sure volumeid is unique for every identical storage blob container in the cluster
# character `#` is reserved for internal use and cannot be used in volumehandle
volumeHandle: unique-volumeid
volumeAttributes:
containerName: EXISTING_CONTAINER_NAME
nodeStageSecretRef:
name: azure-secret
namespace: default
@AjayTripathy - Do you have a sense of what we need to move forward here?
@nikovacevic perhaps a specific ability to delete line items from ETL? How hard you think that would be?
cc @kwombach12
I'm open to exploring this, but I worry it's a workaround to a Kubernetes/cloud issue as that seems to be the primary motivation for ignoring? Let me know if I'm mistaken.
Yeah, I tend to agree with @dwbrown2 -- we can always hack in a "delete" feature as a workaround, but that feels like a product door we may not want to open? At any rate, if we're just trying to find a short-term workaround, it's possible. Just afraid of the precedent it sets.
I routinely do this manually, or have support do this manually for users, when stuff goes wrong and people accidently publish a bad metric. It feels general enough to me that we should do something here that allows bad metrics emitted to not explode the entire pipeline for users.
Ok, let's come up with a plan in v1.101 -- deleting ETL data doesn't feel like a post-code freeze quick hack, but I'm open to the idea that some may simply need to do this, and that it's less frustrating than the alternative options.
Strongly agree with no post code freeze hacks!
+1 on this plan in v1.101 as I'm currently working with a customer to remove erroneous data.
Is this issue still planned to be addressed? :)
@justbert Yes! This is still in our backlog of enhancements
Hey! I also would love if this issue/feature can be addressed. One of my users reached out to me and said that storage is not being billed yet in one of our environments and that the PV costs that are being generated through kubecost is throwing off their costs. They asked me if there is a way to exclude those PV costs so that they have more accurate costs.
As commented in the internal Jira issue, let me know if I can be helpful and migrate that one to another board or clean it up a bit for the team
What problem are you trying to solve? Each namespace has a PVC that maps to an Azure Storage Account (via Blob CSI). The Storage is (a) already accounted for in other costing and (b) is not actively using 10TiB of data (it says it has) and therefore should not be priced as such in Kubecost.
Describe the solution you'd like A way to allow users to ignore specific resources through a configuration within Kubecost
Describe alternatives you've considered Using a relabel config to drop the metrics from Prometheus directly
How would users interact with this feature? Configuration through values.yaml, maybe through an
ignoredResources
section, which can specify a string for resources to ignore. "{namespace}/{objectType}/{objectName}" or "kubecost/persistentvolumeclaim/kubecost-cost-analyzer". Ideally, this includes wildcard support to handle multiple namespaces or objects.┆Issue is synchronized with this Jira Task by Unito