Open axolotlgeoff opened 2 years ago
Thanks for the detective work!
I’m not sure yet I fully understand the issue, though. It sounds like the problem is in the Linux kernel, when the SIGURG signal is received while the program is in a syscall that works with a FUSE file system.
I would assume that the process’s signal mask should be respected by the kernel here, so if you ignore SIGURG in your program, the kernel wouldn’t interrupt?
Either way, do you have an easy way to reproduce this issue? Perhaps with one of the example libfuse file systems, and a minimal Go program that triggers the issue?
Thanks
Reproducing this issue is easy and consistent. Try to clone any git repo in a directory that uses fuse. GCSFuse is one way.
I suspect what is happening:
A Go application accesses the mount
The application raises
SIGURG
signals due to a Go feature introduced in 1.14 for non-cooperative goroutine preemptionFUSE handles the signal and raises an INTERRUPT:
The
InterruptOp
is being handled and cancels the operationThis cancels the context which is passed to the HTTP request to GCS resulting in errors such as:
Others, such as Docker and Gitea, have solved this by filtering out the
SIGURG
as referenced in https://github.com/golang/go/issues/37942 . However as far as I can tell there's no option to ignore theSIGURG
since it's FUSE which is handling them, and by the time it reaches this library as an interrupt the context is lost.Our current workaround is to set
GODEBUG="asyncpreemptoff=1"
for the applications which usegcsfuse
mounts.Whilst the issue is happening whilst using
gcsfuse
I think the solution lies in this package but please let me know if that's not right. Happy to create a PR for this but I'm not entirely sure how to solve it. Can you think of any solutions for this as it seems like more people are running into this issue: https://github.com/GoogleCloudPlatform/gcsfuse/issues/288 https://github.com/GoogleCloudPlatform/gcsfuse/issues/562 ?