Open obacak opened 3 years ago
How has this been isolated to the single replica CSP? Do you have an empirical tests to share that this actually helps what you're suggesting? It could be the backend API server (primera/3par) that gets slammed and grinds to a halt. The CSP is just an API gateway that should be handling this just fine. Also csi.NodeServiceCapability_RPC_GET_VOLUME_STATS
is not implemented properly on the primera/3par CSP yet and that could be a different problem all together.
@datamattsson csi.NodeServiceCapability_RPC_GET_VOLUME_STATS
invokes Getvolumebyid on the csp, does it really matter what response is received(missing params free_bytes/used_bytes)? Btw, we got to know about the changes last Friday from the spec side.
How has this been isolated to the single replica CSP? Do you have an empirical tests to share that this actually helps what you're suggesting?
On v1.4 we see many calls from the hpe-csi-node pods (48 of them since we have 48 workers) calling primera3par-csp-svc
like:
>>>>> Get Volume Cmd - Volume name/id: pvc-36acb784-6dd5-4f4f-b748-995c9aecadfe" file="get_volume_cmd.go:59"
in return the primera3par-csp pod makes a call to the kube-apiserver to find out whether that hpeinfovolume
object exists or not, those requests are stacked to a point that a volume create
request coming from the controller which triggers the same method on the primera3par-csp method like >>>>> Get Volume Cmd - Volume name/id
takes too long to return, that initial loss of time has a cascading effect on provisioning, attaching and mounting operation.
It could be the backend API server (primera/3par) that gets slammed and grinds to a halt
If you mean by API server the storage array, then I say no. This blockage is happening at the primera3par-csp
pod level when it receives the >>>>> Get Volume Cmd - Volume name/id
request and asks that volume to the kube-apiserver, there are no calls at that point to the storage array. I believe there is a call once kube-apiserver responds "no there is no such object" then a call is made to the storage array but I might be wrong on that one since primera3par code is not opensource, maybe @sneharai4 can shed some light here.
on a 48 worker node cluster where we have v1.3 deployed and we have around 425 volumes, please see the request IDs (which are all sequential) and the timestamps:
Mar 21, 2021 @ 12:38:27.861 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:38:27Z" level=info msg="[ REQUEST-ID 126096 ] -- >>>>> Create Volume Cmd for volume pvc-d24cf511-0639-42a4-9604-a209bf3eca95" file="create_volume_cmd.go:100"
Mar 21, 2021 @ 12:38:27.861 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:38:27Z" level=info msg="[ REQUEST-ID 126096 ] -- Create volume request (after unmarshal): &models.CreateVolumeRequest{Name:\"pvc-d24cf511-0639-42a4-9604-a209bf3eca95\", Size:1073741824, Description:\"Block Volume created with the HPE CSI Driver for Kubernetes\", BaseSnapshotId:\"\", Clone:false, Config:models.Config{Cpg:\"\", SnapCpg:\"\", ProvisioningType:\"tpvv\", ImportVol:\"\", ImportVolAsClone:\"\", CloneOf:\"\", Compression:false, ReplicationDevices:\"\", RemoteCopyGroup:\"\", VirtualCopyOf:\"\", VolumeGroup:\"\", IscsiPortalIps:\"\"}}\n" file="request_handler.go:93"
Mar 21, 2021 @ 12:38:27.859 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:38:27Z" level=info msg="[ REQUEST-ID 126095 ] -- <<<<<< Get Volume By Name" file="request_handler.go:143"
Mar 21, 2021 @ 12:38:27.859 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:38:27Z" level=info msg="[ REQUEST-ID 126095 ] -- <<<<< Get Volume Cmd - Volume name/id: pvc-d24cf511-0639-42a4-9604-a209bf3eca95" file="get_volume_cmd.go:80"
Mar 21, 2021 @ 12:38:27.840 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:38:27Z" level=info msg="[ REQUEST-ID 126095 ] -- >>>>>>>>> Get Volume By Name " file="request_handler.go:137"
Mar 21, 2021 @ 12:38:27.840 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:38:27Z" level=info msg="[ REQUEST-ID 126095 ] -- >>>>> Get Volume Cmd - Volume name/id: pvc-d24cf511-0639-42a4-9604-a209bf3eca95" file="get_volume_cmd.go:60"
Mar 21, 2021 @ 12:36:13.644 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:36:13Z" level=info msg="[ REQUEST-ID 126094 ] -- Published value in get volume cmd %!(EXTRA bool=false)" file="get_volume_cmd.go:137"
Mar 21, 2021 @ 12:36:13.644 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:36:13Z" level=info msg="[ REQUEST-ID 126094 ] -- GET VOLUME BY ID: {\"Mountpoint\":\"\",\"config\":{\"cpg\":\"\",\"provisioning_type\":\"tpvv\"},\"description\":\"Block Volume created with the HPE CSI Driver for Kubernetes\",\"id\":\"pvc-0260012f-a93e-4314-8224-1f281ff2ece1\",\"name\":\"pvc-0260012f-a93e-4314-8224-1f281ff2ece1\",\"published\":false,\"size\":1073741824}" file="get_volume_cmd.go:170"
Mar 21, 2021 @ 12:36:13.644 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:36:13Z" level=info msg="[ REQUEST-ID 126094 ] -- <<<<<< Get Volume By Id" file="request_handler.go:133"
Mar 21, 2021 @ 12:36:13.644 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:36:13Z" level=info msg="[ REQUEST-ID 126094 ] -- <<<<< Get Volume Cmd - Volume name/id: pvc-0260012f-a93e-4314-8224-1f281ff2ece1" file="get_volume_cmd.go:174"
Mar 21, 2021 @ 12:36:13.641 primera3par-csp-77f98b579f-xf9nl hpestorage/hpe3parprimera-csp:v1.1.0 time="2021-03-21T11:36:13Z" level=info msg="[ REQUEST-ID 126094 ] -- >>>>> Get Volume Cmd - Volume name/id: pvc-0260012f-a93e-4314-8224-1f281ff2ece1" file="get_volume_cmd.go:60"
And see below another 48 worker node cluster where we have v1.4 deployed with around 600 volumes, again please notice the timestamps and the request IDs:
Mar 21, 2021 @ 13:07:49.465 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101735 ] -- <<<<<< Get Volume By Id" file="request_handler.go:228"
Mar 21, 2021 @ 13:07:49.465 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101735 ] -- Published value in get volume cmd %!(EXTRA bool=true)" file="get_volume_cmd.go:69"
Mar 21, 2021 @ 13:07:49.465 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101735 ] -- <<<<< Get Volume Cmd - Volume name/id: pvc-0857d47f-818f-4e73-91b5-92ad3a51d6d8" file="get_volume_cmd.go:110"
Mar 21, 2021 @ 13:07:49.465 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101735 ] -- GET VOLUME BY ID: {\"Mountpoint\":\"/var/lib/kubelet/plugins/hpe.com/mounts/pvc-0857d47f-818f-4e73-91b5-92ad3a51d6d8\",\"config\":{\"compression\":\"false\",\"cpg\":\"\",\"provisioning_type\":\"tpvv\",\"snap_cpg\":\"\"},\"description\":\"Block Volume created with the HPE CSI Driver for Kubernetes\",\"id\":\"pvc-0857d47f-818f-4e73-91b5-92ad3a51d6d8\",\"name\":\"pvc-0857d47f-818f-4e73-91b5-92ad3a51d6d8\",\"published\":true,\"size\":1073741824,\"volume_group_id\":\"\"}" file="get_volume_cmd.go:106"
Mar 21, 2021 @ 13:07:49.398 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101823 ] -- >>>>> Get Volume Cmd - Volume name/id: pvc-32cb257e-aa41-4615-8b1f-76a1feb76d78" file="get_volume_cmd.go:59"
Mar 21, 2021 @ 13:07:49.398 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101823 ] -- >>>>>>>>> Get Volume By Name " file="request_handler.go:232"
Mar 21, 2021 @ 13:07:49.260 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101734 ] -- GET VOLUME BY ID: {\"Mountpoint\":\"/var/lib/kubelet/plugins/hpe.com/mounts/pvc-7ba439ff-4f73-42d1-82ac-fe62e2b91a32\",\"config\":{\"compression\":\"false\",\"cpg\":\"\",\"provisioning_type\":\"tpvv\",\"snap_cpg\":\"\"},\"description\":\"Block Volume created with the HPE CSI Driver for Kubernetes\",\"id\":\"pvc-7ba439ff-4f73-42d1-82ac-fe62e2b91a32\",\"name\":\"pvc-7ba439ff-4f73-42d1-82ac-fe62e2b91a32\",\"published\":true,\"size\":5368709120,\"volume_group_id\":\"\"}" file="get_volume_cmd.go:106"
Mar 21, 2021 @ 13:07:49.260 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101734 ] -- Published value in get volume cmd %!(EXTRA bool=true)" file="get_volume_cmd.go:69"
Mar 21, 2021 @ 13:07:49.260 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101734 ] -- <<<<<< Get Volume By Id" file="request_handler.go:228"
Mar 21, 2021 @ 13:07:49.260 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101734 ] -- <<<<< Get Volume Cmd - Volume name/id: pvc-7ba439ff-4f73-42d1-82ac-fe62e2b91a32" file="get_volume_cmd.go:110"
Mar 21, 2021 @ 13:07:49.197 primera3par-csp-85858fff66-6wtqn quay.io/hpestorage/hpe3parprimera-csp:v1.2.1 time="2021-03-21T12:07:49Z" level=info msg="[ REQUEST-ID 101822 ] -- >>>>>>>>> Get Volume By Id " file="request_handler.go:221"
Also csi.NodeServiceCapability_RPC_GET_VOLUME_STATS is not implemented properly on the primera/3par CSP yet and that could be a different problem all together.
I know, the implementation is done on the hpe-csi-node but it is hpe-csi-node asks for the existence of the volume to primera3par-csp which in return forwards that information to kube-apiserver.
Once we received a "patched" hpe-csi-node where that single line of code taken out, the provisioning of a new volume (provisioning/attaching/mounting) came down from 10 minutes to 36 seconds on that same 48 worker node cluster and we started to see less >>>>> Get Volume Cmd - Volume name/id
entries in primera3par-csp logs.
I would like to re-iterate my initial question: Is it possible to deploy primera3par-csp as a daemonset
or deployment
with many replicas?
Thanks for elaborating! I hope we can get some more eyes on this (@rgcostea @raunakkumar @rkumpf). AFAIK the CSP should not make any calls to the kube-apiserver for csi.NodeServiceCapability_RPC_GET_VOLUME_STATS
and if that's what you're seeing, that is a problem altogether. Bear in mind I'm not overly familiar with the primera/3par CSP as it uses several CRDs not part of the CSP spec.
I however like the DaemonSet
idea with the traffic policy as the controller-driver would never need to traverse the network to reach a CSP and the node-driver would have its own CSP to fulfill Node* operations against. What I "like" is not usually what ends up in the CSI driver so I'm hoping the team can get back to you.
Thanks you @datamattsson.
AFAIK the CSP should not make any calls to the kube-apiserver for csi.NodeServiceCapability_RPC_GET_VOLUME_STATS and if that's what you're seeing, that is a problem altogether
I believe that is the case but again @sneharai4 should be able to confirm it.
I will be looking forward to hearing from the team.
Thanks for elaborating! I hope we can get some more eyes on this (@rgcostea @raunakkumar @rkumpf). AFAIK the CSP should not make any calls to the kube-apiserver for
csi.NodeServiceCapability_RPC_GET_VOLUME_STATS
and if that's what you're seeing, that is a problem altogether. Bear in mind I'm not overly familiar with the primera/3par CSP as it uses several CRDs not part of the CSP spec.I however like the
DaemonSet
idea with the traffic policy as the controller-driver would never need to traverse the network to reach a CSP and the node-driver would have its own CSP to fulfill Node* operations against. What I "like" is not usually what ends up in the CSI driver so I'm hoping the team can get back to you.
The node driver does make a call to the CSP to fetch the volume attributes so there could be multiple nodes trying to query the CSP to retrieve the volume attributes. We will try to test out the above scenario with other CSPs (Nimble, CV) and confirm whether we hit the same issue. We haven't considered the Daemonset approach but we were thinking of including a flag disableNodeVolumeStats
by which this feature could be optionally disabled on large scale systems.
@raunakkumar , thanks for your message. We would definitely need that flag on v1.4.x driver but we also would like to be able to get the metrics from the driver with v1.5, this is something we are keeping an eye on since last year.
As I cannot see the source code of CSP, can the primera3par-csp work as multiple replicas? Are there any issues you might think of?
Some updates on this:
On our lab (3 worker cluster), I have tried primera3par-csp with 3 replicas.
I expect DaemonSet with local externalPolicy service will just work but that will not help us in our normal cluster (48 workers) since the controller will talk to its local CSP pod and all requests will be stacked on it: For example in one worker we have around 70-80 pods with around 20 hpe.csi.com volumes, draining the node makes the CSP pod work for around 12-13 minutes so all the volumes are detached from that host and attached to other workers across the cluster, i.e. an elasticsearch pod takes around 15 minutes to start in another worker, 12 minutes is gone because of detach/attach and mount operations.
So the idea is that by increasing the replica count of the CSP pod at least the detach and attach operations (DeleteVLUNRequest
and CreateVLUNRequest
) can be sent to the storage array in parallel by multiple CSP pods for different volumes.
So during the test I discovered that the session between the CSP pod and the storage array is initiated by the controller so this becomes problematic when the second request to the CSP service hits the second CSP pod, there is no session on that one so a new session has to be established which results in further time loss. Although when all the CSP pods have their sessions with the storage array, things get fast but it only lasts for 15 minutes and all the CSP pods lose their sessions at the same time and another round of time loss hits the cluster just because the controller needs to initiate the session renewal for the CSP pods.
I believe with minimal code change both on primera3par-csp and on the csi-driver, csp deployment can have more than 1 replicas which will speed up create/detach/attach/delete volume operations considerably in big clusters.
I have one question though: Can the storage array handle this multi replica set-up? Or what is the limit?
Some updates on this:
On our lab (3 worker cluster), I have tried primera3par-csp with 3 replicas.
I expect DaemonSet with local externalPolicy service will just work but that will not help us in our normal cluster (48 workers) since the controller will talk to its local CSP pod and all requests will be stacked on it: For example in one worker we have around 70-80 pods with around 20 hpe.csi.com volumes, draining the node makes the CSP pod work for around 12-13 minutes so all the volumes are detached from that host and attached to other workers across the cluster, i.e. an elasticsearch pod takes around 15 minutes to start in another worker, 12 minutes is gone because of detach/attach and mount operations.
So the idea is that by increasing the replica count of the CSP pod at least the detach and attach operations (
DeleteVLUNRequest
andCreateVLUNRequest
) can be sent to the storage array in parallel by multiple CSP pods for different volumes.So during the test I discovered that the session between the CSP pod and the storage array is initiated by the controller so this becomes problematic when the second request to the CSP service hits the second CSP pod, there is no session on that one so a new session has to be established which results in further time loss. Although when all the CSP pods have their sessions with the storage array, things get fast but it only lasts for 15 minutes and all the CSP pods lose their sessions at the same time and another round of time loss hits the cluster just because the controller needs to initiate the session renewal for the CSP pods.
I believe with minimal code change both on primera3par-csp and on the csi-driver, csp deployment can have more than 1 replicas which will speed up create/detach/attach/delete volume operations considerably in big clusters.
I have one question though: Can the storage array handle this multi replica set-up? Or what is the limit?
@obacak - Couple of things I would like to clarify here:
@imran-ansari , thanks for the extra info, maybe I should write down my test results below for 3 replica csp and single replica controller on our lab cluster which has 3 workers:
I create a single replica test deployment
which is using a pvc
.
csi-provisioner notices the pvc
and issues a CreateVolumeRequest
to the hpe-csi-controller.hpe-csi-driver (controller). Initially the controller checks whether that pv
(which is to be created) exists or not via a GET
method to http://primera3par-csp-svc:8080/containers/v1/volumes?name=pvc-d496f2c0-939c-438e-b90c-40118a8af101
.
The request arrives at the primera3par-csp pod nr.1 (csp-0). csp-0 checks this against the kube-apiserver
to see if hpevolumeinfo
with that name exists or not and sends back 404
to the controller. This time the controller does a POST
request to the csp service: http://primera3par-csp-svc:8080/containers/v1/volumes
, this request hits at csp-1 pod and at this point it returns back to the controller with the following message: session renewal required
. The controller attempts to login and this request arrives on csp-2.
Now, out of the 3 csp pods, only the csp-2 has a session with the storage array and the next Create Volume Request
arrives at the csp-0 which replies back to the controller with Sending following message to CSI driver: session renewal required
and 404
for the pv
which was requested to be created. The controller logs the following: Received a null reader. That is not expected.
and does again About to attempt login to CSP for backend ...
So on a big cluster, there will be huge ping-pong between the controller and the csp pods till all csp pods have a valid session for the storage array as the csp service
will balance the request to a different csp pod
each time.
I did not look in detail the controller's source code but I can imagine if csp is to run in a k8s cluster with multiple replicas, the controller needs to be free of any session creation for csp against the storage array and that responsibility should be on the csp only.
@imran-ansari , thanks for the extra info, maybe I should write down my test results below for 3 replica csp and single replica controller on our lab cluster which has 3 workers:
I create a single replica test
deployment
which is using apvc
.csi-provisioner notices the
pvc
and issues aCreateVolumeRequest
to the hpe-csi-controller.hpe-csi-driver (controller). Initially the controller checks whether thatpv
(which is to be created) exists or not via aGET
method tohttp://primera3par-csp-svc:8080/containers/v1/volumes?name=pvc-d496f2c0-939c-438e-b90c-40118a8af101
. The request arrives at the primera3par-csp pod nr.1 (csp-0). csp-0 checks this against thekube-apiserver
to see ifhpevolumeinfo
with that name exists or not and sends back404
to the controller. This time the controller does aPOST
request to the csp service:http://primera3par-csp-svc:8080/containers/v1/volumes
, this request hits at csp-1 pod and at this point it returns back to the controller with the following message:session renewal required
. The controller attempts to login and this request arrives on csp-2. Now, out of the 3 csp pods, only the csp-2 has a session with the storage array and the nextCreate Volume Request
arrives at the csp-0 which replies back to the controller withSending following message to CSI driver: session renewal required
and404
for thepv
which was requested to be created. The controller logs the following:Received a null reader. That is not expected.
and does againAbout to attempt login to CSP for backend ...
So on a big cluster, there will be huge ping-pong between the controller and the csp pods till all csp pods have a valid session for the storage array as the csp
service
will balance the request to a different csppod
each time.I did not look in detail the controller's source code but I can imagine if csp is to run in a k8s cluster with multiple replicas, the controller needs to be free of any session creation for csp against the storage array and that responsibility should be on the csp only.
May be to get around this, the session information can be maintained in CRD (besides being in-memory) so that all the CSPs see the same info.
In a new session CRD?
In a new session CRD?
Yes. But that will be in addition to the in-memory session cache. So the first lookup will always be in-memory and if the session is not found there then CRD will be queried. If the session is still not found then the session-creation flow will take place.
I think that makes sense, I have been looking at the controller source code and your proposal will ensure minimum code change there, plus the CSP can always check that CRD for a valid session in case in-memory session cache is empty.
@imran-ansari , thanks for the extra info, maybe I should write down my test results below for 3 replica csp and single replica controller on our lab cluster which has 3 workers:
I create a single replica test
deployment
which is using apvc
.csi-provisioner notices the
pvc
and issues aCreateVolumeRequest
to the hpe-csi-controller.hpe-csi-driver (controller). Initially the controller checks whether thatpv
(which is to be created) exists or not via aGET
method tohttp://primera3par-csp-svc:8080/containers/v1/volumes?name=pvc-d496f2c0-939c-438e-b90c-40118a8af101
. The request arrives at the primera3par-csp pod nr.1 (csp-0). csp-0 checks this against thekube-apiserver
to see ifhpevolumeinfo
with that name exists or not and sends back404
to the controller. This time the controller does aPOST
request to the csp service:http://primera3par-csp-svc:8080/containers/v1/volumes
, this request hits at csp-1 pod and at this point it returns back to the controller with the following message:session renewal required
. The controller attempts to login and this request arrives on csp-2. Now, out of the 3 csp pods, only the csp-2 has a session with the storage array and the nextCreate Volume Request
arrives at the csp-0 which replies back to the controller withSending following message to CSI driver: session renewal required
and404
for thepv
which was requested to be created. The controller logs the following:Received a null reader. That is not expected.
and does againAbout to attempt login to CSP for backend ...
So on a big cluster, there will be huge ping-pong between the controller and the csp pods till all csp pods have a valid session for the storage array as the csp
service
will balance the request to a different csppod
each time.I did not look in detail the controller's source code but I can imagine if csp is to run in a k8s cluster with multiple replicas, the controller needs to be free of any session creation for csp against the storage array and that responsibility should be on the csp only.
@obacak Is the issue with create volumes or volume attach/detach? Storing the sessions in a CRD works but creates a security risk to anyone who could hit the CSPs with the session. The ping pong would occur only on the initial request. Those sessions are cached until the storage provider's TTL is encountered. Scaling replicas does come with this challenge. We are working on retrieving the stats using the StatFS (as your initial PR).
@raunakkumar , the issue is there for any operation concerning the communication between the CSP and the storage array; so it is create volume, detach volume, attach volume and delete volume. Security risk on the CRD can be mitigated by RBAC, giving specific right for the hpe service accounts only. In our clusters, no one but only cluster admins can have access to the storage system where hpe-csi is deployed. Plus, this not any different than having the secret necessary to create a session against the storage array. On top of that, we use Calico so we can easily create Network Policy therefore only the controller can talk to the CSP service.
In our clusters (especially in the big ones), ping pong is not acceptable, it could hit anyone; delaying deployment or upgrade of services.
This is an enhancement request on top of hpe-csi v1.4 where
csi.NodeServiceCapability_RPC_GET_VOLUME_STATS,
is enabled on hpe-csi-node, as a result, this puts too much strain on the single replica primera3par-csp deployment on big K8s clusters (We have clusters with 48 workers) polling that single pod with many requests, saturating it thus detach/attach operations are hindered to a point where the cluster is not usable.We had to ask for a test image from HPE so
csi.NodeServiceCapability_RPC_GET_VOLUME_STATS
is disabled on the v1.4 and that solved this immediate problem on the big cluster.The proposal is the following:
Would it be possible to deploy the primera3par-csp as a daemonset and set its service
spec.externalTrafficPolicy
toLocal
so the hpe-csi-node pods will poll their local primera3par-csp pod for the volume stats and the csp pod which runs on the same K8s worker as the controller will be working as a "leader"?Could you please advise?