pion / webrtc

Pure Go implementation of the WebRTC API
https://pion.ly
MIT License
13.67k stars 1.65k forks source link

Goroutine leaks when using datachannel with a multi mux with single port #2738

Closed tyohan closed 4 months ago

tyohan commented 6 months ago

Your environment.

What did you do?

I saw a goroutine leaks in my SFU when using datachannel with a multi mux with single port. The go routine is not able Able to reproduce this with a test in my fork. The test is failed caused by routine check after run a test.

image

What did you expect?

Pass the test that I added when reproduce this issue.

The test is not close the single port muxer which is expected because when running a SFU server with a single port muxer the ice.NewMultiUDPMuxFromPort mux listener will keep open until the SFU is shut down. The test can be passed when the mux is closed, but not when the mux is keep open.

When I check with pprof, the blocked goroutines are listed like this:

3 @ 0x43e54e 0x46dc19 0x46dbf9 0x47ae45 0x873516 0x878434 0x935aba 0x471a01
#   0x46dbf8    sync.runtime_notifyListWait+0x138               /usr/local/go/src/runtime/sema.go:527
#   0x47ae44    sync.(*Cond).Wait+0x84                      /usr/local/go/src/sync/cond.go:70
#   0x873515    github.com/pion/sctp.(*Stream).ReadSCTP+0xd5            /go/pkg/mod/github.com/pion/sctp@v1.8.13/stream.go:146
#   0x878433    github.com/pion/datachannel.(*DataChannel).ReadDataChannel+0x53 /go/pkg/mod/github.com/pion/datachannel@v1.5.5/datachannel.go:193
#   0x935ab9    github.com/pion/webrtc/v3.(*DataChannel).readLoop+0xb9      /go/pkg/mod/github.com/pion/webrtc/v3@v3.2.32/datachannel.go:361

2 @ 0x43e54e 0x44e985 0x815c6f 0x9239fc 0x88cd0f 0x471a01
#   0x815c6e    github.com/pion/transport/v2/packetio.(*Buffer).Read+0x1ae  /go/pkg/mod/github.com/pion/transport/v2@v2.2.4/packetio/buffer.go:267
#   0x9239fb    github.com/pion/webrtc/v3/internal/mux.(*Endpoint).Read+0x1b    /go/pkg/mod/github.com/pion/webrtc/v3@v3.2.32/internal/mux/endpoint.go:40
#   0x88cd0e    github.com/pion/srtp/v2.(*session).start.func1+0xae     /go/pkg/mod/github.com/pion/srtp/v2@v2.0.18/session.go:144

And based from this, I traced the issue is caused by sync.(*Cond).Wait()and it never resolved even the peer connection is closed. I assumed the mux endpoint is not get the closed event because the mux is actually never closed when using a single port muxer. I happy to fix this bug to help me learn the codebase and able to contribute more to this Pion project, but will be helpful if there is any pointing to where I should looking.

Thanks

cnderrauber commented 6 months ago

The test failed because udpmux start goroutine to listen to the udp port and it is not closed in the test that is expected to fail, after running your test code locally I can't find any data-channel related goroutine in the test report.

cnderrauber commented 6 months ago
3 @ 0x43e54e 0x46dc19 0x46dbf9 0x47ae45 0x873516 0x878434 0x935aba 0x471a01
#   0x46dbf8    sync.runtime_notifyListWait+0x138               /usr/local/go/src/runtime/sema.go:527
#   0x47ae44    sync.(*Cond).Wait+0x84                      /usr/local/go/src/sync/cond.go:70
#   0x873515    github.com/pion/sctp.(*Stream).ReadSCTP+0xd5            /go/pkg/mod/github.com/pion/sctp@v1.8.13/stream.go:146
#   0x878433    github.com/pion/datachannel.(*DataChannel).ReadDataChannel+0x53 /go/pkg/mod/github.com/pion/datachannel@v1.5.5/datachannel.go:193
#   0x935ab9    github.com/pion/webrtc/v3.(*DataChannel).readLoop+0xb9      /go/pkg/mod/github.com/pion/webrtc/v3@v3.2.32/datachannel.go:361

2 @ 0x43e54e 0x44e985 0x815c6f 0x9239fc 0x88cd0f 0x471a01
#   0x815c6e    github.com/pion/transport/v2/packetio.(*Buffer).Read+0x1ae  /go/pkg/mod/github.com/pion/transport/v2@v2.2.4/packetio/buffer.go:267
#   0x9239fb    github.com/pion/webrtc/v3/internal/mux.(*Endpoint).Read+0x1b    /go/pkg/mod/github.com/pion/webrtc/v3@v3.2.32/internal/mux/endpoint.go:40
#   0x88cd0e    github.com/pion/srtp/v2.(*session).start.func1+0xae     /go/pkg/mod/github.com/pion/srtp/v2@v2.0.18/session.go:144

It seems like a peerconnection leak in you code that the srtp session keeps opening.

tyohan commented 6 months ago

@cnderrauber thank you for trying my code. This issue might not directly related to data-channel but it is more to the single port muxer. The SRTP session keeps opening because the buffer read function also stuck waiting the new packet or the connection is closed which is never closed in single port muxer. I'll try to dig more and see if this is more on my end instead of Pion related. Will keep it updated in this issue.

tyohan commented 4 months ago

I like to close this for now. It seems the bug is caused by not closing a failed peer connection. Now I always closed a failed peer connection and it seems the leaks is gone. I will keep monitoring this on my side if not closing a failed peer connection is a real cause of this issue and reopen this if the goroutine leak is happening again.

Thank you.