pion / webrtc

Pure Go implementation of the WebRTC API
https://pion.ly
MIT License
13.41k stars 1.63k forks source link

DataChannel.readLoop goroutine leak #2098

Open lenaky opened 2 years ago

lenaky commented 2 years ago

Your environment.

What did you do?

I got goroutine leak from our system which is based on ion-sfu.

(dlv) bt
0  0x000000000043cc65 in runtime.gopark
   at /__w/_tool/go/1.16.4/x64/src/runtime/proc.go:337
1  0x000000000046fe38 in runtime.goparkunlock
   at /__w/_tool/go/1.16.4/x64/src/runtime/proc.go:342
2  0x000000000046fe38 in sync.runtime_notifyListWait
   at /__w/_tool/go/1.16.4/x64/src/runtime/sema.go:513
3  0x000000000047b879 in sync.(*Cond).Wait
   at /__w/_tool/go/1.16.4/x64/src/sync/cond.go:56
4  0x000000000071fd56 in github.com/pion/sctp.(*Stream).ReadSCTP
   at /github/home/go/pkg/mod/github.com/pion/sctp@v1.8.2/stream.go:108
5  0x00000000007271bd in github.com/pion/datachannel.(*DataChannel).ReadDataChannel
   at /github/home/go/pkg/mod/github.com/pion/datachannel@v1.5.2/datachannel.go:186
6  0x00000000007ee3da in git.dev.hpcnt.com/hyperconnect/pion-webrtc/v3.(*DataChannel).readLoop
   at /github/home/go/pkg/mod/git.dev.hpcnt.com/hyperconnect/pion-webrtc/v3@v3.1.2-hpcnt.release/datachannel.go:324

As you can see, Stream.ReadSCTP hangs while waiting for signal at Stream.readNotifier. at that moment association and dataChannel have been already closed.

(dlv) p s.association.readLoopCloseCh
chan struct {} {
    qcount: 0,
    dataqsiz: 0,
    buf: *[0]struct struct {} [],
    elemsize: 0,
    closed: 1,
    elemtype: *runtime._type {size: 0, ptrdata: 0, hash: 670477339, tflag: tflagExtraStar|tflagRegularMemory (10), align: 1, fieldAlign: 1, kind: 25, equal: runtime.memequal0, gcdata: *0, str: 56587, ptrToThis: 543648},
    sendx: 0,
    recvx: 0,
    recvq: waitq<struct {}> {
        first: *sudog<struct {}> nil,
        last: *sudog<struct {}> nil,},
    sendq: waitq<struct {}> {
        first: *sudog<struct {}> nil,
        last: *sudog<struct {}> nil,},
    lock: runtime.mutex {
        lockRankStruct: runtime.lockRankStruct {},
        key: 0,},}
(dlv) p s.association.closeWriteLoopCh
chan struct {} {
    qcount: 0,
    dataqsiz: 0,
    buf: *[0]struct struct {} [],
    elemsize: 0,
    closed: 1,
    elemtype: *runtime._type {size: 0, ptrdata: 0, hash: 670477339, tflag: tflagExtraStar|tflagRegularMemory (10), align: 1, fieldAlign: 1, kind: 25, equal: runtime.memequal0, gcdata: *0, str: 56587, ptrToThis: 543648},
    sendx: 0,
    recvx: 0,
    recvq: waitq<struct {}> {
        first: *sudog<struct {}> nil,
        last: *sudog<struct {}> nil,},
    sendq: waitq<struct {}> {
        first: *sudog<struct {}> nil,
        last: *sudog<struct {}> nil,},
    lock: runtime.mutex {
        lockRankStruct: runtime.lockRankStruct {},
        key: 0,},}
(dlv) p s.association.streams
map[uint16]*github.com/pion/sctp.Stream [
    0: *{
        association: *(*"github.com/pion/sctp.Association")(0xc0431f9500),
        lock: (*sync.RWMutex)(0xc01a279e08),
        streamIdentifier: 0,
        defaultPayloadType: PayloadTypeWebRTCBinary (53),
        reassemblyQueue: *(*"github.com/pion/sctp.reassemblyQueue")(0xc072a4a840),
        sequenceNumber: 1,
        readNotifier: *(*sync.Cond)(0xc04c6bea80),
        readErr: error nil,
        writeErr: error nil,
        unordered: false,
        reliabilityType: 0,
        reliabilityValue: 0,
        bufferedAmount: 19,
        bufferedAmountLow: 0,
        onBufferedAmountLow: nil,
        log: github.com/pion/logging.LeveledLogger(*github.com/pion/logging.DefaultLeveledLogger) ...,
        name: "0:0xc0431f9500",},
]

I presume while SCTPTransport.Start is in progress, Association.readLoop is closed and Association.unregisterStream which is defer action does not affect if Association.OpenStream not called yet.

I don't know exactly what causes this. I think may be It happens when peerConnection closed while data channel connection is being established.

What did you expect?

What happened?

mafredri commented 2 years ago

@lenaky Does https://github.com/pion/sctp/pull/236 solve your issue?

lenaky commented 2 years ago

@mafredri sorry for late. I didn't fix this issue and I'll check and let u know ur PR work for me. thanks!

forcom commented 1 year ago

@mafredri On behalf of @lenaky, I apply your PR to check this issue. It seems that the issue does not reappear with some modification which I left in yours. Thanks for your work!