Open jcsp opened 1 day ago
Looking at the log, we got stuck for 2 minutes, and the second retry of the same operation immediately succeeded.... This likely indicates that we hit some weird limit on the Azure side...
And if the second retry immediately succeeded, why it doesn't permit the first request to go through...?
So, either this is a bug with our implementation / the blob client, or we need to deal with this situation that we actively retry
It would be helpful if you can provide playbook like instructions how to mitigate this problem until this issue is resolved
I think we just need to wait -- the current timeout for the list operation is 2 minute, while I believe the stuck project operation is also configured at somewhere around 2 minute. That probably explains why at the time people tag NeonBot the stuck projects are already gone, because it gets retried exactly at that moment and succeeded.
via https://neondb.slack.com/archives/C081W75HSE7/p1732199510578199
Our requests appear to be intermittently hanging, and we're not handling it well