Open lindmin opened 5 months ago
It is a well-known situation that calling Close
more than once on something may result in erroneous behavior. Consider: https://go.dev/play/p/K9uveTKRNus
Which returns:
panic: close of closed channel
goroutine 1 [running]:
main.main()
/tmp/sandbox2717924249/prog.go:8 +0x30
Also, note that attempting to closing an os.File
twice will return an error file already closed
.
Now, all of that said, we could support this a little better, by locking the sftp.File
mutex during a Close
call and then marking it as already closed, then returning fs.ErrClosed
when we expect that double close has occurred. Rather than trying to send a known broken SSH_FXP_CLOSE
to the server for it to respond back with an SSH_FX_FAILURE
.
P.S.: Note that your code was erroneous even if using os.File
, but the error produced would be simply dropped on the floor, as you’re not actually trapping it in the defer
.
Hello,
I knew double close is not a really clean code, I was fool by other module implementation of Close where Close can be called safely multiple time.
I create the issue it because the ReadFrom error happend randomly, and it take me some time to identify that was the second close() on a already closed file reference that impacting another file reference copy process, it seems to me very sneaky.
With your solution, it will be more explicit and preventive :+1:
where Close can be called safely multiple time.
This may not be strictly true. Since you’re throwing away any errors that are happening, you cannot know whether these multiple closes are actually safe or not.
Hm, this is still kind of weird though, because the SSH_FXP_CLOSE
shouldn’t be impacting any other open file handles. 🤔
And even with UseConcurrentWrites(true)
the length of data being written, isn’t enough to trigger concurrent transfers.
Actually:
debug1: request 2162: write "(null)" (handle -1) off 0 len 14
This is the sftp server related what the request it got was. Which appears to be:
&sshFxpWritePacket{
ID: 2162,
Handle: "",
Offset: 0,
Length: 14,
Data: []byte{ /* 14-bytes of data here */ },
}
And that’s just all sorts of wrong. There shouldn’t really be a way that we can even send that.
I think what might help is more of the server logs preceding one of these errors. 🤔 While the logs are, I’m sure, relatively free of sensitive information, you should probably attach it as a file, rather than embed it. There’s almost certainly a lot of noise going on in there.
here the log of one run with 2 copy errors :
bapt@BVK-book:~$ ./golang_sftp_fd_null
💣💣💣💣💣 COPY: sftp: "Failure" (SSH_FX_FAILURE)
💣💣💣💣💣 COPY: sftp: "Failure" (SSH_FX_FAILURE)
Interesting, found two corresponding close
requests via no bytes being written, all others have written 14
:
Jan 16 12:24:26 BVK-book sftp-server[33548]: close "/tmp/file-696.dat" bytes read 0 written 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: close "/tmp/file-948.dat" bytes read 0 written 0
From here, we can then extract the entries for file-696.dat
, and mix in where the status Failure
occurs:
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 701: open flags 26
Jan 16 12:24:26 BVK-book sftp-server[33548]: open "/tmp/file-696.dat" flags WRITE,CREATE,TRUNCATE mode 0666
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 701: sent handle handle 4
--
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 2150: write "(null)" (handle -1) off 0 len 14
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2150: sent status 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: sent status Failure
--
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2142: close handle 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: close "/tmp/file-696.dat" bytes read 0 written 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2142: sent status 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: sent status Success
--
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 3146: remove
Jan 16 12:24:26 BVK-book sftp-server[33548]: remove name "/tmp/file-696.dat"
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 3146: sent status 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: sent status Success
Wait a second, handle handle 4
? But that’s not right. We already used that handle. 😮 oh no. What about the open
for 948?
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 954: open flags 26
Jan 16 12:24:26 BVK-book sftp-server[33548]: open "/tmp/file-948.dat" flags WRITE,CREATE,TRUNCATE mode 0666
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 954: sent handle handle 5
handle handle 5
. Again, an attempt to reuse an existing handle. In fact, these got shunted all the way to the back of the line. Almost all of the open
s happen first, before much if any write
s happen. But these two “bugs” in the open
requests both happen deep into the write
s. Looking at the lowest number handles:
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 1: sent handle handle 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 2: sent handle handle 1
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 4: sent handle handle 2
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 3: sent handle handle 3
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 5: sent handle handle 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 6: sent handle handle 5
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 7: sent handle handle 6
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 8: sent handle handle 7
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 13: sent handle handle 8
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 9: sent handle handle 9
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1105: close handle 1
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1124: close handle 2
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1138: close handle 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 701: sent handle handle 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1192: close handle 5
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug1: request 954: sent handle handle 5
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1309: close handle 3
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 1866: close handle 0
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2003: close handle 6
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2005: close handle 8
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2006: close handle 7
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2013: close handle 9
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2142: close handle 4
Jan 16 12:24:26 BVK-book sftp-server[33548]: debug3: request 2193: close handle 5
Hm. The ones reusing a handle occur immediately after the close for that handle.
Something is very sketchy here, and it’s not immediately apparent that it would be our code.
Congratulations! You have managed to stumble across a race condition in openssh!
There is actually absolutely nothing wrong with our code. It’s operating correctly, and openssh is the one which is misbehaving.
I’ll be isolating it, preparing an issue report to their repo, then will link to the issue here. (And maybe a PR if I can work out how to address it at the same time.)
Oh ok, thank you for your analysis on that ! :+1:
Hm… ordering them by line numbers, I see something different. No bug or race condition in openssh
(well, small bug, but only in the logging, but since it’s not multithreaded, this was not making sense.)
Instead, it’s (obviously) a race condition in the code which is multithreaded:
Thread A: open /tmp/file-5.dat
; Response: handle 4
Thread A: write handle 4
with 14 bytes; Response: OK
Thread A: close handle 4
; Response: OK
Thread B: open /tmp/file-40.dat
; Response: handle 4
(reused)
Thread A: close handle 4
; Response: OK
Thread B: write /tmp/file-40.dat
; Response: Failure
Thread B: close handle 4
; Response: File not found
Thread B: close handle 4
; Response: File not found
So, this is a bug in sftp
, and we’ll need to prevent the potential of a reused file handler being closed by a delayed second close on another thread of requests.
If you can try out the new PR, and verify that you don’t seem to hit the error anymore. I know we cannot for sure say it doesn’t happen anymore, but some amount of confidence would be great.
Hello,
version of go :
1.20.5
version of lib sftp :v1.13.6
version of sftp server:OpenSSH_8.9p1 Ubuntu-3ubuntu0.6
I sometime experiment a bug when my code close 2 times the same sftp.File ref in concurrent situation
here the code to reproduce :
main.go
go.mod
When i launched this code, sometime i got 1 or 2
SSH_FX_FAILURE
on theReadFrom
method .output of my code :
On server side, I can see that type of logs:
Now, if I replace
with
I cant reproduce the error.
In the Close method documentation, it is mention that
Close closes the File, rendering it unusable for I/O.
. Is-it really the case ?Does a PR that add a flag check "alreadyClose" in the Close method before send a close request to sftp server is a great idea ?
thank you.