Closed okozachenko1203 closed 2 weeks ago
We experience DeadlineExceeded
for detach operation also. But there is no errors on external controller side and even on openstack side.
I am not sure what this error message means and which part is the problem.
Error processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262": failed to detach: rpc error: code = DeadlineExceeded desc = context deadline exceeded
The operations apparently have different timeout in the attacher and in the CSI driver.
I don't know what timeouts you use, the default value of --timeout
is 15 seconds. So the attacher issues a ListVolumes
/ ControllerUnpublish
call and waits for 15 seconds for a response and then times out.
On the driver side, it looks like there is no timeout, and in your logs I can see the ListVolumes
completes in 35 seconds. That's long past the attacher timeout. Our docs don't list ListVolumes
call explicitly, but it behaves the same as ControllerPublish
/ ControllerUnpublish
. The attacher timeout should be long enough to fit most ListVolume
calls.
I1108 08:46:19.441046 11 controllerserver.go:319] ListVolumes: completed with 83 entries and "" next token
@jsafrane here are the args for csi-attacher container in my deployment.
containers:
- args:
- --csi-address=$(ADDRESS)
- --timeout=3m
- --leader-election=true
- --default-fstype=ext4
- -v=4
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
image: registry.k8s.io/sig-storage/csi-attacher:v4.2.0
I think this timeout will be applied for all actions in attacher.
Can you please post complete attacher and corresponding driver logs with at least few ListVolume requests?
The code looks like it should use your 3 minute timeout for ListVolumes. It creates a new 3 min context at the beginning and passes it to ListVolume at the end:
Can you please post complete attacher and corresponding driver logs with at least few ListVolume requests?
The code looks like it should use your 3 minute timeout for ListVolumes. It creates a new 3 min context at the beginning and passes it to ListVolume at the end:
I pasted csi-attacher and cinder-plugin logs. I filtered Found NodeID xxx in CSINode yyy
logs in attacher log.
https://paste.opendev.org/show/blkDBjTauGVIQmtjfozT/ https://paste.opendev.org/show/bBh1zd9lzaaQu4qsKpSa/
i can see that it was timed out after 3mins exactly in csi-attacher
I1108 12:06:03.004767 1 csi_handler.go:123] Reconciling VolumeAttachments with driver backend state
I1108 12:08:58.096259 1 reflector.go:559] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.CSINode total 8 items received
E1108 12:09:03.005889 1 controller.go:146] Failed to reconcile volume attachments: failed to ListVolumes: failed to list volumes: rpc error: code = DeadlineExceeded desc = context deadline exceeded
In driver, ListVolumes completed in1m30s at 12:07:41
.
I1108 12:06:03.005245 11 utils.go:91] [ID:716] GRPC call: /csi.v1.Controller/ListVolumes
I1108 12:06:03.005291 11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
I1108 12:06:13.692517 11 utils.go:91] [ID:717] GRPC call: /csi.v1.Identity/Probe
I1108 12:07:13.691793 11 utils.go:91] [ID:718] GRPC call: /csi.v1.Identity/Probe
I1108 12:07:41.437185 11 controllerserver.go:319] ListVolumes: completed with 83 entries and "" next token
I1108 12:08:13.694518 11 utils.go:91] [ID:719] GRPC call: /csi.v1.Identity/Probe
I1108 12:09:13.693673 11 utils.go:91] [ID:720] GRPC call: /csi.v1.Identity/Probe
I1108 12:10:03.008628 11 utils.go:91] [ID:721] GRPC call: /csi.v1.Controller/ListVolumes
I1108 12:10:03.008671 11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
I1108 12:10:03.520247 11 controllerserver.go:319] ListVolumes: completed with 83 entries and "" next token
2 strange things for me
ListVolumes
result from the driver container?while true; do
response=$(curl -H "X-Auth-Token: $TOKEN" $volume_api_url)
http_code=$(echo $response | awk '{print $1}')
time_taken=$(echo $response | awk '{print $2}')
if [[ "$http_code" -ne 200 ]]; then
count=$((count + 1))
echo "Failure count: $count (HTTP code: $http_code, Time taken: ${time_taken}s)"
else
echo "Success (HTTP code: $http_code, Time taken: ${time_taken}s)"
fi
sleep 1
done
There is at least one ListVolumes request that took >5 minutes on the CSI driver side:
The attacher:
I1108 12:02:03.001065 1 csi_handler.go:123] Reconciling VolumeAttachments with driver backend state
The driver:
I1108 12:02:03.001581 11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
...
I1108 12:07:41.437185 11 controllerserver.go:319] ListVolumes: completed with 83 entries and "" next token
it turned out that attacher works as expected but csi driver has some issue. need to investigate more.
Thanks @jsafrane
I run cinder-csi plugin. I can see this error log every 5mins in
csi-attacher
container.But in csi driver container, I cannot find any error messages at that time.
Is this false negative?