tsurubee / sshr

Proxy server for routing SSH connections
MIT License
199 stars 24 forks source link

Fix memory leak #30

Closed tsurubee closed 4 years ago

tsurubee commented 4 years ago

Since sshr probably has a memory leak bug, the purpose of this PR is to investigate the cause of the leak and fix it.

According to the results of investigating the number of goroutines using pprof, the goroutine shown below seems to be leaking. This goroutine is used to read and write packets between the SSH client and the upstream server.

goroutine profile: total 9
2 @ 0x42f78b 0x42f833 0x406a2d 0x406805 0x62cb3b 0x45d501
#   0x62cb3a    golang.org/x/crypto/ssh.(*ProxyConn).Wait.func2+0x7a    /root/go/pkg/mod/github.com/tsurubee/sshr.crypto@v0.0.0-20190518234326-629a3c1db872/ssh/proxy.go:286
tsurubee commented 4 years ago

When sshr has one SSH connection

1 @ 0x42f78b 0x42f833 0x40760e 0x4072fb 0x627c40 0x66ca93 0x631ba7 0x45d501
#   0x627c3f    golang.org/x/crypto/ssh.(*ProxyConn).Wait+0xdf          /root/go/pkg/mod/github.com/tsurubee/sshr.crypto@v0.0.0-20190518234326-629a3c1db872/ssh/proxy.go:290
#   0x66ca92    github.com/tsurubee/sshr/sshr.(*SSHServer).serve.func1+0x242    /root/go/pkg/mod/github.com/tsurubee/sshr@v0.1.6/sshr/server.go:88
#   0x631ba6    golang.org/x/sync/errgroup.(*Group).Go.func1+0x56       /root/go/pkg/mod/golang.org/x/sync@v0.0.0-20190227155943-e225da77a7e6/errgroup/errgroup.go:57

1 @ 0x42f78b 0x42f833 0x40760e 0x40733b 0x61362e 0x628d51 0x62ca7f 0x45d501
#   0x61362d    golang.org/x/crypto/ssh.(*handshakeTransport).readPacket+0x4d   /root/go/pkg/mod/github.com/tsurubee/sshr.crypto@v0.0.0-20190518234326-629a3c1db872/ssh/handshake.go:187
#   0x628d50    golang.org/x/crypto/ssh.piping+0x30             /root/go/pkg/mod/github.com/tsurubee/sshr.crypto@v0.0.0-20190518234326-629a3c1db872/ssh/proxy.go:420
#   0x62ca7e    golang.org/x/crypto/ssh.(*ProxyConn).Wait.func1+0x4e        /root/go/pkg/mod/github.com/tsurubee/sshr.crypto@v0.0.0-20190518234326-629a3c1db872/ssh/proxy.go:282

1 @ 0x42f78b 0x42f833 0x40760e 0x40733b 0x61362e 0x628d51 0x62cb0f 0x45d501
#   0x61362d    golang.org/x/crypto/ssh.(*handshakeTransport).readPacket+0x4d   /root/go/pkg/mod/github.com/tsurubee/sshr.crypto@v0.0.0-20190518234326-629a3c1db872/ssh/handshake.go:187
#   0x628d50    golang.org/x/crypto/ssh.piping+0x30             /root/go/pkg/mod/github.com/tsurubee/sshr.crypto@v0.0.0-20190518234326-629a3c1db872/ssh/proxy.go:420
#   0x62cb0e    golang.org/x/crypto/ssh.(*ProxyConn).Wait.func2+0x4e        /root/go/pkg/mod/github.com/tsurubee/sshr.crypto@v0.0.0-20190518234326-629a3c1db872/ssh/proxy.go:286

After the SSH connection exits

1 @ 0x42f78b 0x42f833 0x406a2d 0x406805 0x62cb3b 0x45d501
#   0x62cb3a    golang.org/x/crypto/ssh.(*ProxyConn).Wait.func2+0x7a    /root/go/pkg/mod/github.com/tsurubee/sshr.crypto@v0.0.0-20190518234326-629a3c1db872/ssh/proxy.go:286
tsurubee commented 4 years ago

The state of gouroutines as shown above is monitored using pprof by inserting the following code into main.go.

import (
    "database/sql"
    "fmt"
    "github.com/tsurubee/sshr/sshr"
    "log"

        "net/http"
    _ "net/http/pprof"
)

func main() {
        go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
・・・

And we can view the goroutines state with the following URL through a web browser.
http://localhost:6060/debug/pprof/goroutine?debug=1

tsurubee commented 4 years ago

Cause of the leak

The place that contained the bug is the following Wait function of tsurubee/sshr.crypto.
https://github.com/tsurubee/sshr.crypto/blob/master/ssh/proxy.go#L278 This function is responsible for sending packets in both directions between the SSH client and the server. For both directions, unbuffered channel is shared to write the error message when the connection was lost. However, using an unbuffered channel here caused a bug.
For example, if the SSH client sends exit command to disconnect, the piping function from the client to the server raise an error and writes error message to an unbuffered channel. Then, when returning the error message, the SSH connection is closed by the defer function, and the piping function from the server to the client raise an error and writes it to the channel. At this time, since there is nothing to receive this channel, the unbuffered channel will block at this point, resulting in causing a goroutine leak.

Solution

The problem was solved by replacing the unbuffered channel with the channel with a buffer size of 1.
The bugs are solved by modifying the code of tsurubee / sshr.crypto in the following PR.
https://github.com/tsurubee/sshr.crypto/pull/3 After the modification, I confirmed that goroutine does not leak using pprof.

heat1024 commented 4 years ago

I appreciate for fixing leak problem!

heat1024 commented 4 years ago

I fixed version-control strings to v0.0.0-20200227043732-5db8c8aac292 from 20200227133524-08ab068d7cff because it makes failure on build.

I think this error from deleted branch.

So I used current master timestamp-commit and pass the build.

Is that version correct?

I needs your confirm... ^^;

tsurubee commented 4 years ago

@heat1024 I'm so sorry… I was wrong version-control strings. The correct version-control strings is shown below.

v0.0.0-20200227133732-5db8c8aac292

This strings can be confirmed by executing the following command in the sshr.crypto repository.

$ git log --pretty=format:"[%cd] %H" --date=iso | head -n 3
[2020-02-27 13:37:32 +0900] 5db8c8aac2927e7e66e03deee7bef6ed88efd71d
[2020-02-27 13:35:24 +0900] 08ab068d7cff203cabd454b7e2c01fe638c21435
[2020-02-20 17:19:38 +0900] 466786fab74aed3b5ec5dbfa8c059c63bd949217
heat1024 commented 4 years ago

Oh, I see.

Probably 20200227043732 is UTC, 20200227133732 is JST.

I'll use v0.0.0-20200227133732-5db8c8aac292 !

Thanks!

tsurubee commented 4 years ago

I fixed the version strings of sshr.crypto in the PR below.
https://github.com/tsurubee/sshr/pull/31/files