Closed Fricounet closed 6 months ago
Welcome @Fricounet!
It looks like this is your first PR to kubernetes-csi/external-attacher 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-csi/external-attacher has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @Fricounet. Thanks for your PR.
I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
connection.go:234] "Connecting" address="unix:///csi/csi.sock"
this seems taking exactly 1 second from your logs. Do you know why? It looks like the CSI response is very fast, and 1 second timeout should be enough for both RPC calls.
/ok-to-test
@huww98 good point. I had not paid attention that both CSI connection calls were taking 1s each. But I think this might just be the backoff for a failed call to csi-lib-utils Connect
kicking in. So its possibly just the csi-driver that failed to respond on the first call and the next call succeeded after the backoff. I will investigate a bit more but I still feel that each csi call that are supposed to be 'short' should use their own timeouts instead of sharing the same one. Especially when there are calls in the middle that don't have those timeouts (like the one to connect to the driver).
I've figured where this delay comes from and It is not an issue with the connection to the driver but with a change to the Probe
function that is run just after. In the latest version of csi-lib-utils, it was changed to use a Ticker
instead of the sleep it was using before between each probe. As a result, the first probe is only launched after 1s instead of immediately resulting in the delay we see
I've submitted a PR in https://github.com/kubernetes-csi/csi-lib-utils/pull/175 to restore the previous behavior of the Probe. But this PR stays relevant IMHO because the timeout should be used for short csi calls and having it shared for 2 calls separated by potentially long running calls (Connection and Probe for instance) feels wrong.
/lgtm /approve
You're right both about the ProbeForever being slower and that the timeout context should start just before corresponding call.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: Fricounet, jsafrane
The full list of commands accepted by this bot can be found here.
The pull request process is described here
BTW, thanks a ton for testing the latest sidecars and reporting and even fixing the issue!
And thanks for all our work on CSI and storage!
What type of PR is this?
/kind bug
What this PR does / why we need it:
The same context.WithTimeout was used in different places when calling the CSI drivers in the main function. With recent changes, the code execution started taking longer and the context would hit its timeout on the
GetPluginCapabilities
driver call. For instance with when running with the azuredisk-csi-driver (I've seen this on the aws-ebs and the gcp-pd drivers too):This change uses a separate context.WithTimeout for each short CSI call using the default
csiTimeout
defined in main.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: