Open yuxiang-he opened 4 months ago
cc @pohly
Also, using a literal segment
value in workItem
makes logs more meaningful.
The current log shows segments using their hex pointer values, which is not useful in determining which segment this line is for.
I0222 03:48:01.410043 1 capacity.go:574] Capacity Controller: refreshing {segment:0xc003ae26a8 storageClassName:ssd-shared}
The problem that I had to solve here is that the content of a topology.Segment
is a slice, and slices cannot be used as keys in a Go map. The solution was to use the address of a topology.Segment
as key for the map and then ensure that there are never any topology.Segment
instances with the same content.
Perhaps that second part is going wrong somehow?
Also, using a literal segment value in workItem makes logs more meaningful.
workItem
should implement logr.Marshaler
such that it prints the pointer value (= the unique key) and the referenced segment, and log calls should use structured, contextual logging. Then we would get all the desired information.
Could it also work by using the string value of the segment in workItem
? e.g. There is already a function that generates a string for a Segment
https://github.com/kubernetes-csi/external-provisioner/blob/e7c0bcc89c20bc070e90cf734b51b9256d1e7170/pkg/capacity/topology/topology.go#L38-L40
We can try implement another string function that prints the SegmentEntry
in strictly sorted order and leave out the pointer value in the string to ensure that
SegmentItem
orderPerhaps that second part is going wrong somehow?
I see two places that call addWorkitem
which adds the items to c.capacities
map
In onTopologyChanges
https://github.com/kubernetes-csi/external-provisioner/blob/e7c0bcc89c20bc070e90cf734b51b9256d1e7170/pkg/capacity/capacity.go#L357-L359
In onSCAddOrUpdate
https://github.com/kubernetes-csi/external-provisioner/blob/e7c0bcc89c20bc070e90cf734b51b9256d1e7170/pkg/capacity/capacity.go#L382-L384
The segments are all returned from c.topologyInformer
so I don't believe they will have the same address even if they represent the same segment. And addWorkitem
simply checks key existence in this way which does not work when the segments exist as addresses in workItem
.
https://github.com/kubernetes-csi/external-provisioner/blob/e7c0bcc89c20bc070e90cf734b51b9256d1e7170/pkg/capacity/capacity.go#L473-L476
The segments are all returned from c.topologyInformer so I don't believe they will have the same address even if they represent the same segment.
c.topologyInformer
should keep segment pointers unique. If it doesn't, then that's the bug.
And addWorkitem simply checks key existence in this way which does not work when the segments exist as addresses in workItem.
Why should this not work? A struct is equal if all of its fields are equal. workItem contains two comparable fields, so it should be comparable:
Could it also work by using the string value of the segment in workItem?
Yes, that would also be possible. But I expect the string handling to be more expensive than the current pointer comparison.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
What happened: We have a CSI plugin that creates local disk backed PVCs to pods.
In a cluster with 1693 nodes that runs the CSI plugin, there are
CSIStorageCapacity
objects with duplicatednodeTopology x storageClassName
combinations.In addition, the duplicated
CSIStorageCapacity
are likely due to duplicated workqueue items in the capacity controller, leading to huge workqueue backlog.Our cluster currently has 1693 kubelet nodes, and by right there should be 2 * 1693 unique
CSIStorageCapacity
objects (2 storage classes for each node)This is the workqueue depth of the capacity controller,
csistoragecapacity
depth is between 5k - 6kIn addition,
csistoragecapacities_desired_goal
is around 6.18kThere are also inconsistencies in how the controller determines desired
CSIStorageCapacity
object count. An earlier instance of the external provisioner pod in the same cluster with effectively the same node count (+/- 1 node) did calculate the correct amount ofCSIStorageCapacity
objects, and workqueue depth looked much betterTheory for the root cause: I believe there is currently an issue where the capacity controller is tracking duplicated workqueue entries. This caused the delayed
CSIStorageCapacity
creation due to the controller working through a huge workqueue backlog.workItem
is the key to the mapcapacities map[workItem]*storagev1.CSIStorageCapacity
. The fieldworkItem.segment
is a pointer*topology.Segment
When checking if a key exists here https://github.com/kubernetes-csi/external-provisioner/blob/v3.6.2/pkg/capacity/capacity.go#L473 even if
item.segment
has the same value as the supposedworkItem
key inc.capacities
,found
will almost always be false becausec.capacities[item]
is comparing the pointer value*topology.Segment
, not the literal values withintopology.Segment
. I've also tested this out in a similar setup https://go.dev/play/p/bVwW4eb9n5A to validate the theory.Output:
This leads to duplicated work items tracked by the controller, and it will poll duplicated items in https://github.com/kubernetes-csi/external-provisioner/blob/v3.6.2/pkg/capacity/capacity.go#L517-L520
What you expected to happen: No duplicated workqueue entries for
CSIStorageCapacity
objects.How to reproduce it:
Anything else we need to know?:
Environment:
kubectl version
): v1.28.5uname -a
):