sshnet / SSH.NET

SSH.NET is a Secure Shell (SSH) library for .NET, optimized for parallelism.
http://sshnet.github.io/SSH.NET/
MIT License
3.99k stars 932 forks source link

Sshclient deadlock/freeze on disconnect #355

Closed gary-holland closed 4 years ago

gary-holland commented 6 years ago

Hi,

My code is locking when attempting to call the Disconnect function.

This snippet is an example of what is locking:

SshClient client = new SshClient("server", "user", "pass");
client.Connect();
client.Disconnect(); # freezes here

I'm connecting to a ssh connection running on a QNAP NAS server, with the server version listed as "SSH-2.0-OpenSSH_7.6".

This is on OSX (High Sierra). I just tried in Windows, and the issue doesn't exist there.

Any help would be much appreciated.

Thanks.

erik-wramner commented 5 years ago

We are also hit badly by this. Unfortunately it would be tricky to get a manual build through here, but if an alpha version is pushed to NuGet I would be happy to test it.

soul4soul commented 5 years ago

I ran into this issue too.

My environment is WSL Ubuntu 18.04 PowerShell 7.0.0-preview.3 Posh-SSH 3.0 branch (which actually uses https://github.com/asmodat/Asmodat-Standard-SSH.NET, but that is a .NET core nuget of the latest version of this code base)

For me this issue is resolved by replacing

_socket.Shutdown(SocketShutdown.Send); with _socket.Shutdown(SocketShutdown.Both);

in the SocketDisconnectAndDispose in Session.cs

EDIT: I really need SFTP for .NET CORE today, so I forked and created a temporary package on nuget. Will switch back to the real package when this issue has been solved. https://www.nuget.org/packages/SSH.NET.Fork/2018.8.25.2

This worked for me at least in the one environment I ran it. We'll see how it holds up as a solution as I expand to more machines.

etheaven commented 5 years ago

has anybody figured out a solution for .net framework?

etheaven commented 5 years ago

the problem still persist, everytime ssh tries to disconnect it just locks and never stops, rip.

joshgo commented 4 years ago

I don't think it's going to get fixed anytime soon. The bug fix/issue https://github.com/dotnet/corefx/issues/31368 didn't make it to 3.0 (See: https://github.com/dotnet/corefx/pull/38804#issuecomment-509584230) and the issue has been tagged with https://github.com/dotnet/corefx/milestone/44 (Nov 2020)

For now, I can confirm calling _socket.Shutdown(SocketShutdown.Both) works on High Sierra on Mac.

etheaven commented 4 years ago

@joshgo I'll check it out, until now, I had to make the app to restart everytime it disconnected because apparently there is no properly functional .net ssh lib, never expected that

oguzhantopcu commented 4 years ago

I have just upgraded to .net 3.1. both 2016.1.0 and 2016.0.0 is not working right now. Formerly 2016.0.0 was working.

It is really bad, what can we do about this problem?

TheAngryByrd commented 4 years ago

I haven’t tried it yet but https://github.com/tmds/Tmds.Ssh might be worth checking out.

erik-wramner commented 4 years ago

@TheAngryByrd I only gave it a quick look, but as far as I can tell that library supports neither SCP nor SFTP. At least for our purposes it is nowhere near a replacement.

qub1n commented 4 years ago

And what about this fork? Does it solves the issue?

https://github.com/jamesshew/sshnet/commits/master

drieseng commented 4 years ago

I'll be fixing this issue in the next week or two.

qub1n commented 4 years ago

That's great. Thank you. I will be waiting for this.

kurtlnz commented 4 years ago

Also having this issue on MacOS Catalina (10.15.1), .NET Core 2.2, 2016.1.0.

Looking forward to the fix! :)

ilqvya commented 4 years ago

+1 centos

image

drieseng commented 4 years ago

Update: For now I'm working on #516 as I need this fixed in order to run our integration tests against recent versions of OpenSSH. Next up on my list is this issue.

darkoperator commented 4 years ago

So happy the project is getting some love :)

shanselman commented 4 years ago

Thanks @drieseng !

ageroh commented 4 years ago

Hi, @darkoperator , do we have any news on this deadlock?

Thank you

darkoperator commented 4 years ago

I have not tested the latest updates to see of it is addressed

Sent from my iPhone

On Feb 26, 2020, at 9:45 AM, Argiris Gerogiannis notifications@github.com wrote:

 Hi, @darkoperator , do we have any news on this deadlock?

Thank you

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

drieseng commented 4 years ago

@darkoperator .NET 5.0 should fix the root cause. I haven't yet added a workaround to SSH.NET. Finally finished getting the integration tests (for now still private) running again recent versions of OpenSSH, and now working on getting #629 done and having stable tests on AppVeyor. Once that is done getting the workaround is place is a matter of minutes.

drieseng commented 4 years ago

@darkoperator (or anyone else) Before I started working on a fix, I first tried to reproduce this issue on both .NET Core 2.1.15 and 3.1.1. In both cases, everything checked out ok. Can you still reproduce this issue with the develop branch?

cc: @HalfLegend @JulianRooze @cvallance

drieseng commented 4 years ago

10.000 connect/disconnect sequences, and still going strong. This with:

I'd hate to put out a new release without a fix for this issue.

shanselman commented 4 years ago

@stephentoub @karelz might know someone

stephentoub commented 4 years ago

Sorry, I'm missing the question... I know people but I'm not sure if they can help :smile: What's needed?

oguzhantopcu commented 4 years ago

I have just checked my production logs, the issue still exists with .NET Core 3.1.102 & SSH.NET version 2016.1.0

It looks like, after 5 minutes of waiting (which is my timeout), system skipped the task and continued.

image

I may push SSH.NET develop branch to production to see whether it is working as expected, if it will help.

What do you say?

drieseng commented 4 years ago

@oguzhantopcu Personally, I wouldn't do this. Even though there should not be any breaking changes or regressions, we're not talking about a (fully) tested version.

What OS are you running this on?

drieseng commented 4 years ago

@stephentoub This issue is supposedly fixed by https://github.com/dotnet/corefx/pull/38804. Since that fix will only make it to .NET 5.0, I want to have a workaround in place for the next release of SSH.NET (which is long overdue).

For now, I'm unable to reproduce the original issue (on Ubuntu 19.10). I'm pretty sure I was able to reproduce it consistently at one point in time, so I'm gonna try older versions of Ubuntu.

I don't think there's much you can do for now, but thanks for the offer (and thanks @shanselman for caring).

drieseng commented 4 years ago

Hmm, also failed to reproduce on Ubuntu 14.04.6 using both 2016.1.0 and the develop branch.

oguzhantopcu commented 4 years ago

@drieseng Fedora 30 up-to-date.

uname -a gives following: Linux fed.localdomain 5.4.18-100.fc30.x86_64 #1 SMP Fri Feb 7 14:37:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

oguzhantopcu commented 4 years ago

I would like to give it a try, is the develop branch are you talking about public one?

I mean this, https://github.com/sshnet/SSH.NET/tree/develop if it is, i will deploy it then share the results.

richarddavenport commented 4 years ago

I'm on a mac locally, but we run our code in docker containers. I believe it's an ubuntu base. If someone could provide a build I can test on my mac.

drieseng commented 4 years ago

@oguzhantopcu I'm installing Fedora 30 Server right now. If I still cannot reproduce the problem, I'll have no choice but to implement what is supposed to fix the issue, and rely on the community to validate it.

drieseng commented 4 years ago

@oguzhantopcu On your own risk, you could try the develop branch. For now, I haven't yet implemented the workaround/fix, but I'd still be interested in the results.

erik-wramner commented 4 years ago

We had the problem on "microsoft/dotnet:2.2-aspnetcore-runtime" until we downgraded to an older version of this library. The kernel is reported as 4.15.0-1071-azure with #76-Ubuntu (uname -a). Looking at /etc/issue it says Debian 9.

oguzhantopcu commented 4 years ago

this has fixed my hang problem.

https://github.com/sshnet/SSH.NET/issues/355#issuecomment-415955449

drieseng commented 4 years ago

@oguzhantopcu Did you try the develop branch (without the Socket.Shutdown(SocketShutdown.Both) workaround)?

qub1n commented 4 years ago

I am able to reproduce it on

Ubuntu 18.04 .NET Core 3.1.2 SSH.NET 2016.1.0

drieseng commented 4 years ago

@qub1n Do you have a minimal repro you can share?

qub1n commented 4 years ago

@qub1n Do you have a minimal repro you can share?

I will try to prepare one. Basically it happens if larger number of clients in parallel connects and disconnects. I cannot reproduce it with two clients, but already with 10 probably yes.

drieseng commented 4 years ago

I was able to reproduce the problem (thanks @qub1n!). We're finally getting somewhere.

michelnyx commented 4 years ago

Workaround for it when happens with SftpClient UploadFile method:

            using (var sftp = new SftpClient(connectionInfo))
            {
                sftp.Connect();
                {
                    sftp.BeginUploadFile(sourceStream, destinationFile, (result) =>
                    {
                        sftp.EndUploadFile(result);
                        sftp.Disconnect();
                    });
                }
            }

EDIT: It still hangs at background on disconnect method.

timhill-iress commented 4 years ago

Running on Unbuntu 19.10 Disconnect seems to hang for me with the following code

        public void SendFile(MemoryStream memoryStream, string filename)
        {
            using (var sftpClient = new SftpClient(config.SftpHost, config.SftpPort, config.SftpUsername, config.SftpPassword))
            {

                memoryStream.Position = 0;
                sftpClient.Connect();
                if( !string.IsNullOrEmpty(config.SftpFolder)){
                    sftpClient.ChangeDirectory(config.SftpFolder);
                }
                sftpClient.UploadFile(memoryStream, filename, true, null);

                sftpClient.WriteAllText($"{filename}.md5", GenerateMd5Hash(memoryStream));
                sftpClient.Disconnect();
            }
        }

The same code running on AWS ECS with a docker container based on mcr.microsoft.com/dotnet/core/sdk:2.2-stretch works fine.

I find that when running on my machine removing sftpClient.Disconnect(); stops the hang. I am correct in thinking that the Dispose pattern will clean up the connection for me? Do I need to explicitly call Disconnect?

Is there any tracing I could do to help with this issue?

mizrael commented 4 years ago

same here. I'm getting the issue on a AWS ECS based on mcr.microsoft.com/dotnet/core/sdk:3.1 :

Renci.SshNet.Common.SshConnectionException: Connection reset by peer
 ---> System.Net.Sockets.SocketException (104): Connection reset by peer
   at Renci.SshNet.Abstractions.SocketAbstraction.Read(Socket socket, Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout)
   at Renci.SshNet.Session.TrySocketRead(Byte[] buffer, Int32 offset, Int32 length)
   at Renci.SshNet.Session.ReceiveMessage()
   at Renci.SshNet.Session.MessageListener()
   --- End of inner exception stack trace ---
Unhandled exception. Renci.SshNet.Common.SshConnectionException: Connection reset by peer
 ---> System.Net.Sockets.SocketException (104): Connection reset by peer
   at Renci.SshNet.Abstractions.SocketAbstraction.Read(Socket socket, Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout)
   at Renci.SshNet.Session.TrySocketRead(Byte[] buffer, Int32 offset, Int32 length)
   at Renci.SshNet.Session.ReceiveMessage()
   at Renci.SshNet.Session.MessageListener()
   --- End of inner exception stack trace ---

for now I'm just avoiding the call to Disconnect().

drieseng commented 4 years ago

I'm just about to complete a refactoring of the integration tests. When this is done, fixing this and a few other issues should come in fast.

Rustem-bayetov commented 4 years ago

Hi @drieseng . Facing same issue.

using (var sftp = new SftpClient(host, port, username, password)) {
    try {
        sftp.Connect();
        using (FileStream uplfileStream = File.OpenRead(fileName)) {
            sftp.UploadFile(uplfileStream, Path.Combine(sftpFilesDir, Path.GetFileName(fileName)), true);
        }
    }
    catch (Exception ex) {
        _logger.LogError(ex, "UploadToSFTP");
    }
    finally {
        if (sftp.IsConnected) 
            sftp.Disconnect();     // Freezing here
    }
}

I'm using:

Operating System Mac OS X 10.15.4
Visual Studio Community 2019 for Mac Version 8.5.1 (build 42)
.Net Core 3.1
Renci SSH.NET v2016.1.0 Published 10/16/2017
Dockerized writl/sftp using login/password auth
drieseng commented 4 years ago

@stephentoub @shanselman I finally had time to look into this again. On Linux, the blocking Socket.Select(...) call is not always getting interrupted when we (shutdown and) close the socket from another thread. It works fine serially, but not when we have multiple threads operating in parallel.

I was able to create a small repro that shows the problem consistently on .NET Core 2.1.15 and 3.1.1. I couldn't get 5.0 Preview 2 to install side-by-side on Ubuntu 19.10 to check if that made a difference, but users expect a solution for 2.x and 3.x anyway.

I would be grateful if you could spend a minutes to look into this.

Update: On .NET 5.0, the behavior on Linux now closely matches that on Windows. I still noticed a few minor issues which I will submit.

drieseng commented 4 years ago

I decided to use Socket.Poll(...) where available. This resolves the issue on .NET Core 1.x to 3.x. This fix will be in the next beta release.

drieseng commented 4 years ago

2020.0.0-beta1 is now available, and includes this fix.

woodne commented 4 years ago

Thanks! Intend to try again tomorrow.

jpmdl commented 4 years ago

It works with the new 2020.0.0-beta1!