pion / ice

A Go implementation of ICE
https://pion.ly/
MIT License
454 stars 163 forks source link

Disconnections not detected with ICE Lite #288

Closed Antonito closed 4 years ago

Antonito commented 4 years ago

Your environment.

What did you do?

When using Lite ICE, unexpected disconnections (client terminating without properly closing the peer connection) are not detected.

To reproduce, you can use the pion-to-pion example code and patch a few lines:

answer (source):

    // ... 

    // Prepare the configuration
    config := webrtc.Configuration{}

    m := webrtc.MediaEngine{}
    m.RegisterDefaultCodecs()

    s := webrtc.SettingEngine{}
    s.SetLite(true)

    api := webrtc.NewAPI(webrtc.WithMediaEngine(m), webrtc.WithSettingEngine((s)))

    // Create a new RTCPeerConnection
    peerConnection, err := api.NewPeerConnection(config)
    if err != nil {
        panic(err)
    }

    // ... 

    peerConnection.OnICEConnectionStateChange(func(connectionState webrtc.ICEConnectionState) {
        fmt.Printf("ICE Connection State has changed: %s\n", connectionState.String())
    })

    peerConnection.OnConnectionStateChange(func(connectionState webrtc.PeerConnectionState) {
        fmt.Printf("Peer Connection State has changed: %s\n", connectionState.String())
    })

    // ... 

offer (source):

    // ... 

    // Prepare the configuration
    config := webrtc.Configuration{}

    m := webrtc.MediaEngine{}
    m.RegisterDefaultCodecs()

    s := webrtc.SettingEngine{}
    s.SetLite(true)

    api := webrtc.NewAPI(webrtc.WithMediaEngine(m), webrtc.WithSettingEngine((s)))

    // Create a new RTCPeerConnection
    peerConnection, err := api.NewPeerConnection(config)
    if err != nil {
        panic(err)
    }

    // ... 

    peerConnection.OnConnectionStateChange(func(connectionState webrtc.PeerConnectionState) {
        fmt.Printf("Peer Connection State has changed: %s\n", connectionState.String())
    })

    // Register channel opening handling
    dataChannel.OnOpen(func() {
        fmt.Printf("Data channel '%s'-'%d' open. Random messages will now be sent to any connected DataChannels every 5 seconds\n", dataChannel.Label(), dataChannel.ID())

        // Force a crash after a few seconds to trigger the bug
        go func() {
            time.Sleep(3 * time.Second)
            panic("crash")
        }()

        for range time.NewTicker(5 * time.Second).C {
            message := signal.RandSeq(15)
            fmt.Printf("Sending '%s'\n", message)

            // Send the message as text
            sendTextErr := dataChannel.SendText(message)
            if sendTextErr != nil {
                panic(sendTextErr)
            }
        }
    })

    // ... 

What did you expect?

PeerConnectionState should go to PeerConnectionStateDisconnected

Sean-Der commented 4 years ago

I was able to reproduce this as well! https://gist.github.com/a03e187de02c7abb83adf352bd84c246

Sean-Der commented 4 years ago

@Antonito I was able to reproduce this, and able to get a way more minimal repro! https://github.com/pion/ice/commit/8caa623931f06d6ea96854bfc66d75368fccc7c7

This bug is only exhibited when the Lite agent is running in controlled mode. When acting as the controlling Agent everything is fine. You can see this by flipping these arguments https://github.com/pion/ice/pull/289/files#diff-f1848d6daf4e415b362a6be14e93973aR1680

I wonder if this bug is because of this TODO? https://github.com/pion/ice/blob/master/selection.go#L279-L282

Antonito commented 4 years ago

I also think it is related to this TODO, by changing ContactCandidates() to:

// A lite selector should not contact candidates
func (s *liteSelector) ContactCandidates() {
    if _, ok := s.pairCandidateSelector.(*controllingSelector); ok {
        // ...
        s.pairCandidateSelector.ContactCandidates()
    } else if _, ok := s.pairCandidateSelector.(*controlledSelector); ok {
        s.pairCandidateSelector.ContactCandidates()
    }
}

It seems to be working (your unit test succeeds, and the pion-to-pion thing too). Maybe we could have this as a temporary fix until RFC 8445 S6.1.1 and S6.2 sections are fully implemented?

Sean-Der commented 4 years ago

@Antonito actually I think we just want validateSelectedPair! Can you drop the typecast and just do that?

A ice-lite agent shouldn't send any outbound traffic. If that passes all the unit tests, I think we can try that out :) I would check pion/webrtc passes as well before we push to master.

Antonito commented 4 years ago

Still need the typecasts to access the agent though!

I tried by only calling validateSelectedPair in both cases, it doesn't work. However, by keeping the current controllingSelector code and adding the validateSelectedPair to the controlledSelector, everything works (pion/ice unit tests, pion/webrtc unit tests and the pion-to-pion example).

    if _, ok := s.pairCandidateSelector.(*controllingSelector); ok {
        // nolint:godox
        // pion/ice#96
        // TODO: implement lite controlling agent. For now falling back to full agent.
        // This only happens if both peers are lite. See RFC 8445 S6.1.1 and S6.2
        s.pairCandidateSelector.ContactCandidates()
    } else if v, ok := s.pairCandidateSelector.(*controlledSelector); ok {
        v.agent.validateSelectedPair()
    }
Sean-Der commented 4 years ago

This LGTM, you want to open a PR or I can do it myself :)

Would love for you to get the credit for the work though!