kubernetes-csi / external-provisioner

Sidecar container that watches Kubernetes PersistentVolumeClaim objects and triggers CreateVolume/DeleteVolume against a CSI endpoint
Apache License 2.0
330 stars 318 forks source link

CSIStorageCapacity: Topology segment not updated #847

Open samuelluohaoen1 opened 1 year ago

samuelluohaoen1 commented 1 year ago

What happened: After new node plugins join the cluster and report new AccessibleTopologies.Segments, the current segment information is not getting updated. New CSIStorageCapacity objects are not being created.

What you expected to happen: New node plugins reporting new values for existing topology segments should in a sense "expand" the value sets of existing topology segments. Which in turn should result in CSIStorageCapacity objects being created for new accessible segments.

How to reproduce it:

  1. Suppose the CSIDriver has name com.foo.bar. Check that STORAGECAPACITY is true.
  2. Deploy controller plugin but not node plugin. Wait for external-provisioner to print "Initial number of topology segments 0, storage classes 0, potential CSIStorageCapacity objects 0" (To see this log run external-provisioner with log level 5).
  3. Now CSINode should have DRIVERS: 0.
  4. Deploy the node plugin. Wait for the NodeGetInfo RPC to be called. The RPC should return something like
    {
    "NodeId": "some-node",
    "AccessibleTopologies": {
        "Segments": [
            "kubernetes.io/hostname": "some-node"
        ]
    }
    }
  5. Now CSINode should have DRIVERS: 1 which is named com.foo.bar with Node ID: some-node and Topology Keys: [kubernetes.io/hostname].
  6. Deploy a StorageClass with volumeBindingMode: WaitForFirstConsumer and provisioner: com.foo.bar.
  7. No new CSIStorageCapacity object is created.

Anything else we need to know?: I am using the "kubernetes.io/hostname" label as the only key because we want topology to be constraint by each node. Each PV is to be provisioned locally on some node. I also assumed that "kubernetes.io/hostname" is unique across the nodes and should by default exist on every node (I hope this is a reasonable assumption).

Environment:

@pohly

pohly commented 1 year ago

No new CSIStorageCapacity object is created.

How do you check for this? With kubectl get csistoragecapacities or kubectl get --all-namespaces csistoragecapacities?

CSIStorageCapacity objects are namespaced, so the second command has to be used.

I tried to reproduce the issue with csi-driver-host-path v1.10.0, but there I get new CSIStorageCapacity objects after creating a storage class.

pohly commented 1 year ago

My commands:

/deploy/kubernetes-distributed/deploy.sh
kubectl delete storageclass.storage.k8s.io/csi-hostpath-slow
kubectl delete storageclass.storage.k8s.io/csi-hostpath-fast
kubectl get --all-namespaces csistoragecapacity
kubectl create -f deploy/kubernetes-distributed/hostpath/csi-hostpath-storageclass-fast.yaml
kubectl get --all-namespaces csistoragecapacity
pohly commented 1 year ago

csi-provisioner:v3.3.0

samuelluohaoen1 commented 1 year ago

No new CSIStorageCapacity object is created.

How do you check for this? With kubectl get csistoragecapacities or kubectl get --all-namespaces csistoragecapacities?

CSIStorageCapacity objects are namespaced, so the second command has to be used.

I tried to reproduce the issue with csi-driver-host-path v1.10.0, but there I get new CSIStorageCapacity objects after creating a storage class.

Yes it is indeed namespaced. My kubectl has the default namespace set to the namespace where the CSI plugins are deployed.

samuelluohaoen1 commented 1 year ago

My commands:

/deploy/kubernetes-distributed/deploy.sh
kubectl delete storageclass.storage.k8s.io/csi-hostpath-slow
kubectl delete storageclass.storage.k8s.io/csi-hostpath-fast
kubectl get --all-namespaces csistoragecapacity
kubectl create -f deploy/kubernetes-distributed/hostpath/csi-hostpath-storageclass-fast.yaml
kubectl get --all-namespaces csistoragecapacity

From the sequence of your commands I do not see how the controller plugin is deployment before the node plugins. I think the order of the deployment may be crucial to reproducing this issue. Could you make sure that step 2 happens before node plugins are deployed? Thank you for your trouble.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-csi/external-provisioner/issues/847#issuecomment-1576487977): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
pohly commented 1 year ago

/reopen /assign

k8s-ci-robot commented 1 year ago

@pohly: Reopened this issue.

In response to [this](https://github.com/kubernetes-csi/external-provisioner/issues/847#issuecomment-1591427920): >/reopen >/assign > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
pohly commented 1 year ago

@samuelluohaoen1: it looks like you are using a central controller for your CSI driver. Is that correct?

Can you perhaps share the external-provisioner log at level >= 5? The is code which should react to changes in the node and CSIDriver objects when the node plugin gets registered after the controller has started.

We don't have a CSI driver deployment readily available to test this scenario. I tried reproducing it through unit tests (see https://github.com/kubernetes-csi/external-provisioner/pull/942) but the code worked as expected.

xing-yang commented 1 year ago

/remove-lifecycle rotten

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

yuxiang-he commented 4 months ago

@pohly We observed something similar but the CSIStorageCapacity objects were created after about an hour.

I believe there is currently an issue where the capacity controller is tracking duplicated workqueue entries. See issue https://github.com/kubernetes-csi/external-provisioner/issues/1161

yuxiang-he commented 4 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten