Open ncw opened 2 weeks ago
🤔 Right… so to address this, we would need to trigger a signal to close any outstanding WriteTo
or ReadFrom
(or heck, any outstanding Read
or Write
)…
Actually, let’s look at the behavior of os.File
:
package main
import (
"io"
"fmt"
"os"
"sync"
)
func main() {
f, err := os.Open("/dev/zero")
if err != nil {
panic(err)
}
rd, wr := io.Pipe()
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
_, err := f.WriteTo(wr)
_ = wr.CloseWithError(err)
}()
lr := &io.LimitedReader{ R: rd, N: 42 }
b, err := io.ReadAll(lr)
if err != nil {
panic(err)
}
fmt.Println(string(b))
err = f.Close()
if err != nil {
panic(err)
}
wg.Wait()
}
output:
close-on-open-fd$ go run main.go | allcat -o clip:
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x0?)
/usr/lib64/go/1.23/src/runtime/sema.go:71 +0x25
sync.(*WaitGroup).Wait(0xc0000b0180?)
/usr/lib64/go/1.23/src/sync/waitgroup.go:118 +0x48
main.main()
/home/snowgirl/Work/tmp/go/close-on-open-fd/main.go:42 +0x232
goroutine 6 [select]:
io.(*pipe).write(0xc0000b01e0, {0xc0000d0000, 0x8000, 0x8000})
/usr/lib64/go/1.23/src/io/pipe.go:86 +0x1e5
io.(*PipeWriter).Write(0x55da90?, {0xc0000d0000?, 0x46567f?, 0xc00008e048?})
/usr/lib64/go/1.23/src/io/pipe.go:161 +0x1a
io.copyBuffer({0x4deb38, 0xc0000b01e0}, {0x4dec38, 0xc00008e048}, {0x0, 0x0, 0x0})
/usr/lib64/go/1.23/src/io/io.go:431 +0x1de
io.Copy(...)
/usr/lib64/go/1.23/src/io/io.go:388
os.genericWriteTo(0xc00008e040?, {0x4deb38, 0xc0000b01e0})
/usr/lib64/go/1.23/src/os/file.go:275 +0x4f
os.(*File).WriteTo(0xc00008e040, {0x4deb38, 0xc0000b01e0})
/usr/lib64/go/1.23/src/os/file.go:253 +0x9c
main.main.func1()
/home/snowgirl/Work/tmp/go/close-on-open-fd/main.go:24 +0x57
created by main.main in goroutine 1
/home/snowgirl/Work/tmp/go/close-on-open-fd/main.go:21 +0x150
exit status 2
It seems here, while the Close()
succeeds the WriteTo()
doesn’t crash out or return an error.
🤔
P.S.: Adding a rd.Close()
before the Wait
allows it to complete and WriteTo
returns io.ErrClosedPipe
Min-repro:
package main
import (
"io"
"fmt"
"net"
"os"
"sync"
"golang.org/x/crypto/ssh"
"golang.org/x/crypto/ssh/agent"
"github.com/pkg/sftp"
)
func main() {
var auths []ssh.AuthMethod
if aconn, err := net.Dial("unix", os.Getenv("SSH_AUTH_SOCK")); err == nil {
auths = append(auths, ssh.PublicKeysCallback(agent.NewClient(aconn).Signers))
}
auths = append(auths, ssh.Password("notasecret"))
config := ssh.ClientConfig{
User: "testuser",
Auth: auths,
HostKeyCallback: ssh.InsecureIgnoreHostKey(),
}
addr := fmt.Sprintf("127.0.0.1:2022")
conn, err := ssh.Dial("tcp", addr, &config)
if err != nil {
panic(err)
}
defer conn.Close()
cl, err := sftp.NewClient(conn)
if err != nil {
panic(err)
}
defer cl.Close()
f, err := cl.Open("/dev/zero")
if err != nil {
panic(err)
}
rd, wr := io.Pipe()
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
_, err := f.WriteTo(wr)
_ = wr.CloseWithError(err)
}()
lr := &io.LimitedReader{ R: rd, N: 42 }
b, err := io.ReadAll(lr)
if err != nil {
panic(err)
}
fmt.Printf("%v\n", b)
err = f.Close()
if err != nil {
panic(err)
}
wg.Wait()
}
This locks up:
$ go run main.go
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Adding an rd.Close()
at the bottom causes the process to succeed, with the same io.ErrClosedPipe
coming from the WriteTo
.
It seems that things were only ever working because the io.Pipe
is triggering the end. Moving the rd.Close()
to before the f.Close()
works even on v1.13.7
.
Thank you so much for making a repro @puellanivis
You are right, closing things in a different order makes the deadlock go away
--- a/backend/sftp/sftp.go
+++ b/backend/sftp/sftp.go
@@ -2101,10 +2101,10 @@ func (file *objectReader) Read(p []byte) (n int, err error) {
// Close a reader of a remote sftp file
func (file *objectReader) Close() (err error) {
- // Close the sftpFile - this will likely cause the WriteTo to error
- err = file.sftpFile.Close()
// Close the pipeReader so writes to the pipeWriter fail
_ = file.pipeReader.Close()
+ // Close the sftpFile - this will likely cause the WriteTo to error
+ err = file.sftpFile.Close()
// Wait for the background process to finish
<-file.done
// Show connection no longer in use
I think the new way is probably the best way and the fact that it worked before was due to an implementation detail of the sftp library.
I'm happy with that fix and so we can close this if you are happy?
Yeah, I’m still thinking I’d rather repeat the os.File.Close
behavior, since we’re generally trying to mirror that usage. Though, it would probably be quite a bit of work to ensure we don’t also introduce a race condition, where the WriteTo
could theoretically end up making one or more reads from a different file due to a recycled file handle on the server side.
I’ll timebox this to a week. I would like to at least get dev-v2
working the same as os.File.Close
(that is, it deadlocks without the rd.Close()
but at least doesn’t deadlock on file.Close()
). Then depending on how much work that takes, I’ll have a perspective on how to bring the fix to v1
.
Sounds like a plan! For my use, I'll put the above tweak in and update rclone to v1.13.7 for its next release.
I haven't looked at the v2 branch yet - should I?
Yeah, I think the tweak of closing the read pipe first is definitely the better solution in general. 👍
Yes, you can look at the dev-v2
branch already. 😁
os.File
package in a couple of different configurations.
I noticed when I upgraded the dependencies for rclone that the sftp server tests have started deadlocking.
I used git bisect to narrow this down to this commit d1903fbd460e9a8105bae72fcdf492a4999b4cee
This made an enormous backtrace which I removed all traces without pkg/sftp in and deduplicated the traces which left us with three.
I think backtrace 1 is irrelevant. What appears to have happened is that a goroutine started (backtrace 3) which is holding the
File
lockThe code from rclone which causes this is this (backtrace 3)
And here (backtrace 2)
So the problem is that rclone expects the line
err = file.sftpFile.Close()
to close the sftpFile
and return an error, while now what happens is that because of the locking added in d1903fbd460e9a8105bae72fcdf492a4999b4cee theClose
waits indefinitely for theFile.WriteTo()
to finish which it won't since rclone has stopped putting data into it.So I think this is a regression - this has worked perfectly for many years!
I haven't made a small reproduction - I probably can now I understand what is going on - but maybe you've got an immediate idea on how to solve this problem?
Backtrace 1
goroutine 56 gp=0xc0004048c0 m=nil [sync.Cond.Wait]:
runtime.gopark(0xc0000a5ce0?, 0x1?, 0x20?, 0x24?, 0x23?)
go1.23/src/runtime/proc.go:424 +0xce fp=0xc000352430 sp=0xc000352410 pc=0x473b6e
runtime.goparkunlock(...)
go1.23/src/runtime/proc.go:430
sync.runtime_notifyListWait(0xc00011e0d0, 0x32c)
go1.23/src/runtime/sema.go:587 +0x159 fp=0xc000352480 sp=0xc000352430 pc=0x4753b9
sync.(*Cond).Wait(0x0?)
go1.23/src/sync/cond.go:71 +0x85 fp=0xc0003524c0 sp=0xc000352480 pc=0x484dc5
golang.org/x/crypto/ssh.(*buffer).Read(0xc00007c060, {0xc0001167f8, 0x4, 0x4})
/home/ncw/go/pkg/mod/golang.org/x/crypto@v0.29.0/ssh/buffer.go:94 +0x1ed fp=0xc000352548 sp=0xc0003524c0 pc=0x66eb6d
golang.org/x/crypto/ssh.(*channel).ReadExtended(0xc0003e40c0, {0xc0001167f8?, 0x1c0004048c0?, 0xc0003525d8?}, 0x40811e?)
/home/ncw/go/pkg/mod/golang.org/x/crypto@v0.29.0/ssh/channel.go:363 +0x93 fp=0xc0003525a8 sp=0xc000352548 pc=0x672553
golang.org/x/crypto/ssh.(*channel).Read(0x408245?, {0xc0001167f8?, 0x471f94?, 0xc000400e08?})
/home/ncw/go/pkg/mod/golang.org/x/crypto@v0.29.0/ssh/channel.go:540 +0x25 fp=0xc0003525e0 sp=0xc0003525a8 pc=0x673265
github.com/pkg/sftp.(*conn).Read(0x0?, {0xc0001167f8?, 0xc000352648?, 0x475e49?})
<autogenerated>:1 +0x29 fp=0xc000352610 sp=0xc0003525e0 pc=0x6b0029
io.ReadAtLeast({0xb6e6c0, 0xc0000ea240}, {0xc0001167f8, 0x4, 0x4}, 0x4)
go1.23/src/io/io.go:335 +0x90 fp=0xc000352658 sp=0xc000352610 pc=0x4a93f0
io.ReadFull(...)
go1.23/src/io/io.go:354
github.com/pkg/sftp.recvPacket({0xb6e6c0, 0xc0000ea240}, 0x0, 0x2?)
+0x99 fp=0xc0003526d0 sp=0xc000352658 pc=0x6aae39
github.com/pkg/sftp.(*conn).recvPacket(...)
github.com/pkg/sftp.(*clientConn).recv(0xc0000ea240)
+0xc5 fp=0xc000352790 sp=0xc0003526d0 pc=0x6a89e5
github.com/pkg/sftp.NewClientPipe.func1()
+0x4e fp=0xc0003527e0 sp=0xc000352790 pc=0x69faee
runtime.goexit({})
go1.23/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003527e8 sp=0xc0003527e0 pc=0x47bf21
created by github.com/pkg/sftp.NewClientPipe in goroutine 46
+0x349
Backtrace 2
goroutine 1762 gp=0xc0003a96c0 m=nil [sync.Mutex.Lock]:
runtime.gopark(0x0?, 0xc000034700?, 0xc0?, 0x20?, 0xc00001fbb8?)
go1.23/src/runtime/proc.go:424 +0xce fp=0xc00001fad8 sp=0xc00001fab8 pc=0x473b6e
runtime.goparkunlock(...)
go1.23/src/runtime/proc.go:430
runtime.semacquire1(0xc00038e06c, 0x0, 0x3, 0x1, 0x15)
go1.23/src/runtime/sema.go:178 +0x225 fp=0xc00001fb40 sp=0xc00001fad8 pc=0x450725
sync.runtime_SemacquireMutex(0xc00001fbb0?, 0xe5?, 0xb?)
go1.23/src/runtime/sema.go:95 +0x25 fp=0xc00001fb78 sp=0xc00001fb40 pc=0x475105
sync.(*Mutex).lockSlow(0xc00038e068)
go1.23/src/sync/mutex.go:173 +0x15d fp=0xc00001fbc8 sp=0xc00001fb78 pc=0x4864fd
sync.(*Mutex).Lock(...)
go1.23/src/sync/mutex.go:92
sync.(*RWMutex).Lock(0x2?)
go1.23/src/sync/rwmutex.go:148 +0x31 fp=0xc00001fbf8 sp=0xc00001fbc8 pc=0x4877d1
github.com/pkg/sftp.(*File).Close(0xc00038e050)
+0x3c fp=0xc00001fc58 sp=0xc00001fbf8 pc=0x6a2d1c
github.com/rclone/rclone/backend/sftp.(*objectReader).Close(0xc0002ac680)
github.com/rclone/rclone/backend/sftp/sftp.go:2105 +0x1f fp=0xc00001fc90 sp=0xc00001fc58 pc=0x8cacbf
github.com/rclone/rclone/lib/readers.(*LimitedReadCloser).Close(0xc000338480)
github.com/rclone/rclone/lib/readers/limited.go:17 +0x24 fp=0xc00001fce0 sp=0xc00001fc90 pc=0x7c8d44
github.com/rclone/rclone/fstest/fstests.ReadObject({0xb73360, 0xf3e900}, 0xc000312ea0, {0xb77540, 0xc000486e80}, 0xffffffffffffffff, {0xc00061e350, 0x1, 0x1})
github.com/rclone/rclone/fstest/fstests/fstests.go:293 +0x2f5 fp=0xc00001fdb0 sp=0xc00001fce0 pc=0x925895
github.com/rclone/rclone/fstest/fstests.Run.func13.14.36(0xc000312ea0)
github.com/rclone/rclone/fstest/fstests/fstests.go:1802 +0x310 fp=0xc00001ff70 sp=0xc00001fdb0 pc=0x933db0
testing.tRunner(0xc000312ea0, 0xc000486e00)
go1.23/src/testing/testing.go:1690 +0xf4 fp=0xc00001ffc0 sp=0xc00001ff70 pc=0x526174
testing.(*T).Run.gowrap1()
go1.23/src/testing/testing.go:1743 +0x25 fp=0xc00001ffe0 sp=0xc00001ffc0 pc=0x527165
runtime.goexit({})
go1.23/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00001ffe8 sp=0xc00001ffe0 pc=0x47bf21
created by testing.(*T).Run in goroutine 1096
go1.23/src/testing/testing.go:1743 +0x390
Backtrace 3
goroutine 1676 gp=0xc000160e00 m=nil [select]:
runtime.gopark(0xc000416cf0?, 0x2?, 0x98?, 0x6b?, 0xc000416cb4?)
go1.23/src/runtime/proc.go:424 +0xce fp=0xc000416b50 sp=0xc000416b30 pc=0x473b6e
runtime.selectgo(0xc000416cf0, 0xc000416cb0, 0xc0001ef470?, 0x1, 0x40f905?, 0x1)
go1.23/src/runtime/select.go:335 +0x7a5 fp=0xc000416c78 sp=0xc000416b50 pc=0x44f685
io.(*pipe).write(0xc000112f00, {0xc0003f0000, 0x5f, 0x8000})
go1.23/src/io/pipe.go:86 +0x1e5 fp=0xc000416d50 sp=0xc000416c78 pc=0x4ab185
io.(*PipeWriter).Write(0xc00038e050?, {0xc0003f0000?, 0xc0003f0000?, 0x8000?})
go1.23/src/io/pipe.go:161 +0x1a fp=0xc000416d80 sp=0xc000416d50 pc=0x4ab59a
github.com/pkg/sftp.(*File).writeToSequential(0xc00038e050, {0xb6e140, 0xc000112f00})
+0xfb fp=0xc000416df8 sp=0xc000416d80 pc=0x6a43fb
github.com/pkg/sftp.(*File).WriteTo(0xc00038e050, {0xb6e140, 0xc000112f00})
+0x3a5 fp=0xc000416fa8 sp=0xc000416df8 pc=0x6a4865
github.com/rclone/rclone/backend/sftp.(*Fs).newObjectReader.func1()
github.com/rclone/rclone/backend/sftp/sftp.go:2085 +0x30 fp=0xc000416fe0 sp=0xc000416fa8 pc=0x8cac10
runtime.goexit({})
go1.23/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000416fe8 sp=0xc000416fe0 pc=0x47bf21
created by github.com/rclone/rclone/backend/sftp.(*Fs).newObjectReader in goroutine 1762
github.com/rclone/rclone/backend/sftp/sftp.go:2082 +0x199