w3c / ortc

ORTC Community Group specification repository (see W3C WebRTC for official standards track)
http://www.w3.org/community/ortc/
122 stars 42 forks source link

ORTC shim: How to support 1.0 ICE restart without new DtlsTransport? #755

Open aboba opened 7 years ago

aboba commented 7 years ago

At the WebRTC 1.0 September interim, there was consensus that a new DtlsTransport would not be vended after an ICE restart even if the endpoint changes (e.g. RTCRtpSender.transport remains valid but attributes change). Does this lead to issues in shimming WebRTC 1.0 over ORTC? If so, what ORTC functionality is needed?

Related WebRTC 1.0 Issue: https://github.com/w3c/webrtc-pc/issues/1406

WEBRTC WG September Interim slides: https://docs.google.com/presentation/d/1QjSKzQVINl5Cng0wH7oqvaDnkVNIc-JHUlduMZpfyFQ/edit#slide=id.g268f09a8f9_0_19

aboba commented 7 years ago

@taylor-b @fippo Any thoughts?

taylor-b commented 7 years ago

Peter and I were talking about this and think it would be feasible to just implement the WebRTC RTCIceTransport shim in terms of multiple ORTC RTCIceTransports (and RTCIceGatherers), and similar for DTLS. For example:

  1. When ICE restart offer is applied, shim creates a new RTCIceGatherer and starts gathering.
  2. When answer is applied, if it results in a new DTLS association, create a new RTCIceTransport/RTCDtlsTransport pair (underneath the shims), passing in the new RTCIceGatherer.
  3. If it doesn't result in a new DTLS association, call "start" on the existing RTCIceTransport, with the new gatherer and remote params.

Wouldn't this work?

rshpount commented 7 years ago

Only problem is that new DTLS ClientHello can be received before the answer is received and applied. Your options for dealing with this ClientHello is either to queue all the packets received before the answer is received (should not cause too many issues since you cannot send anything back before answer with ugrag is received and STUN request completes in the opposite direction) or to automatically start DTLS association when ClientHello is received after ICE restart (which you might need to discard if DTLS association does not match the fingerprints in answer).

Do you think this will cause any problems?


Roman Shpount

On Fri, Sep 15, 2017 at 7:34 PM, Taylor Brandstetter < notifications@github.com> wrote:

Peter and I were talking about this and think it would be feasible to just implement the WebRTC RTCIceTransport shim in terms of multiple ORTC RTCIceTransports (and RTCIceGatherers), and similar for DTLS. For example:

  1. When ICE restart offer is applied, shim creates a new RTCIceGatherer and starts gathering.
  2. When answer is applied, if it results in a new DTLS association, create a new RTCIceTransport/RTCDtlsTransport pair (underneath the shims), passing in the new RTCIceGatherer.
  3. If it doesn't result in a new DTLS association, call "start" on the existing RTCIceTransport.

Wouldn't this work?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/ortc/issues/755#issuecomment-329926769, or mute the thread https://github.com/notifications/unsubscribe-auth/AEh5Slt_j9ry2DADtLsk_M0FVvpFIIxlks5siwmNgaJpZM4PZa8h .

aboba commented 7 years ago

@rshpount Currently, ORTC implementations don't respond to incoming Ice connectivity checks or send checks until IceTransport.start() is called. So until IceTransport.start() is called an incoming DTLS client hello can't be received. If IceTransport.start() has been called, and a DtlsTransport has been constructed then it can respond (but not complete until DtlsTransport.start is called to verify the fingerprint).

taylor-b commented 7 years ago

Only problem is that new DTLS ClientHello can be received before the answer is received and applied.

I imagined that the RTCIceTransport could cache the ClientHello, and pass it along to the new RTCDtlsTransport once attached.

Currently, ORTC implementations don't respond to incoming Ice connectivity checks or send checks until IceTransport.start() is called.

I understand why they can't send checks (they don't have the remote ufrag/password), but is there a reason why they can't respond to checks?

lgrahl commented 7 years ago

If it doesn't result in a new DTLS association, call "start" on the existing RTCIceTransport, with the new gatherer and remote params.

@taylor-b So, it's not guaranteed that there will be a new incoming DTLS association when the fingerprint in the answer SDP changes? I'm confused. Why would that be the case? I probably just misinterpreted your comment?

I imagined that the RTCIceTransport could cache the ClientHello, and pass it along to the new RTCDtlsTransport once attached.

The spec does not require buffering any data on ICE transport layer, so the ClientHello may or may not be lost. But IIRC the spec also does not restrict data from passing to a potential DTLS transport prior to calling RTCIceTransport.start.

As an example, RAWRTC buffers all connectivity checks and all other data separately. It processes the connectivity checks once RTCIceTransport.start has been called and processes all DTLS packets once RTCDtlsTransport.start has been called on creation of the RTCDtlsTransport instance, so the connectivity checks and the ClientHello are not lost. But this is implementation specific.

Anyhow, the ClientHello will eventually be repeated, so any stack will receive it at some point. I don't see a problem here as it's an entirely new RTCIceGatherer/RTCIceTransport/RTCDtlsTransport tuple, so I don't see why @taylor-b's proposal should not work. With one exception: A method to replace the DTLS transport needs to be added to RTCSctpTransport if the data channels need to persist (do they?). Consequences for RTCQuicTransport will probably need to be discussed, too.

I understand why they can't send checks (they don't have the remote ufrag/password), but is there a reason why they can't respond to checks?

Because the stack doesn't know whether the remote username fragment is correct (as it lacks knowledge of the remote username fragment). It could blindly respond but it's probably unwise to do so (not to mention that many ICE libraries will not provide a way to do that).

aboba commented 7 years ago

I understand why they can't send checks (they don't have the remote ufrag/password), but is there a reason why they can't respond to checks?

Until IceTransport.start is called, the role is not determined and therefore when an incoming check arives, there is no way to determine whether there is a role conflict.

lgrahl commented 7 years ago

Just an idea: Let set A contain all candidates of the current ICE gatherer and set B contain all candidates of the new ICE gatherer. Assuming A and B are disjoint and the current ICE gatherer is in the completed state, then incoming DTLS packets from set B could be handled as a new DTLS association and a handshake would take place. Once the ICE transport has been restarted with the new ICE gatherer, the DTLS association created by set B could take over.

Fixed idea: Incoming DTLS packets that cannot be associated to a remote candidate and include a ClientHello could be handled as a new DTLS association and a handshake would take place. Once the ICE transport has been restarted with the new ICE gatherer, check if one of the new remote candidates has established another DTLS association. If yes, this DTLS association would take over. That's the only (terrible) alternative I could find so far and I'm pretty sure it will fall apart in some places but maybe it helps someone else to find a better approach.

taylor-b commented 7 years ago

So, it's not guaranteed that there will be a new incoming DTLS association when the fingerprint in the answer SDP changes?

No, it is guaranteed. But the offerer doesn't know if the answerer is going to do that, at offer time.

The spec does not require buffering any data on ICE transport layer, so the ClientHello may or may not be lost.

It's not required, but it's still something implementations could (and would want to) do as an optimization.

With one exception: A method to replace the DTLS transport needs to be added to RTCSctpTransport if the data channels need to persist (do they?)

No, they don't need to. New DTLS association means all your existing data channels go bye-bye. I don't have a reference to the spec handy but I'm 99% confident about this.

It could blindly respond but it's probably unwise to do so

Why is it unwise? It could result in ICE completing quicker. It's pretty common for the offerer to receive STUN pings before it receives the answer, creating a prflx candidate pair that's updated to srflx once the actual candidate is received. This happens in Chrome all the time. And it's covered explicitly by the ICE spec:

It is possible (and in fact very likely) that an offerer will receive a Binding request prior to receiving the answer from its peer. If this happens, the agent MUST immediately generate a response (including computation of the mapped address as described in Section 7.2.1.2). The agent has sufficient information at this point to generate the response; the password from the peer is not required.

Until IceTransport.start is called, the role is not determined and therefore when an incoming check arives, there is no way to determine whether there is a role conflict.

Hmm. So I guess that's one place where my idea wouldn't work. Would it be feasible to allow a role to be passed into RTCIceGatherer, to address this? Otherwise we need to go back to the drawing board.

lgrahl commented 7 years ago

No, they don't need to. New DTLS association means all your existing data channels go bye-bye. I don't have a reference to the spec handy but I'm 99% confident about this.

I can't think of a good reason why this would need to be the case. (I also can't think of a good reason why the answerer would change the fingerprint but I'm sure there's a use case or we wouldn't discuss this.) Once the new DTLS association has been established, continue transferring data there. SCTP will take care of the rest (repeating packets, ...). Why would we do this process for RTCRtpSender and RTCRtpReceiver but not for RTCSctpTransport - that makes no sense to me.

Hmm. So I guess that's one place where my idea wouldn't work. Would it be feasible to allow a role to be passed into RTCIceGatherer, to address this? Otherwise we need to go back to the drawing board.

I don't understand why this is a problem. It's not a problem for completely new RTCIceGatherer/RTCIceTransport/RTCDtlsTransport tuples, why would it be one now? Is ICE restart behaviour substantially different to the normal procedure regarding STUN messages?

taylor-b commented 7 years ago

I can't think of a good reason why this would need to be the case

In short, because RTCDataChannel has the ability to guarantee reliable delivery, and there's no defined way for an SCTP association to transfer its state to a new DTLS association. Closing a DTLS association means closing the SCTP association, which means closing its streams, which means closing the data channels that were tied to those streams.

I don't understand why this is a problem.

Because we need the ability to send STUN responses before an answer is received when doing an ICE restart. That's a "MUST" in RFC5245 (see the thing I quoted above). And my idea described above was to create an RTCIceGatherer at offer time. So either RTCIceGatherer needs to be able to send STUN responses, or my idea needs to change.

rshpount commented 7 years ago

Just for the reference, the reason fingerprints change in the answer is due to third party call control: For instance end points A and B are connected through a third party call control agent 3PCC. 3PCC asks end point A for an offer, and then sends this offer to C. Answer from C is sent back to A. Answer from C has new fingerprints and tls-id. Since this is a completely new end point, new DTLS association is needed and SCTP associations need to be re-established. This is fairly common with SIP, so there should be a way to implement this with ORTC.

lgrahl commented 7 years ago

In short, because RTCDataChannel has the ability to guarantee reliable delivery, and there's no defined way for an SCTP association to transfer its state to a new DTLS association.

So, what you are saying is that the problem is that it's not defined. However, IIRC swapping the DTLS transport (and potentially dropping packets) will just look like packet loss to SCTP which then will ensure reliability (in case of a reliable channel) by repeating those packets as it is a fully blown transport protocol. Nevertheless, if you say this is not defined for WebRTC right now, then we don't need to discuss this any further here and I'll open a new issue for this.

Because we need the ability to send STUN responses before an answer is received when doing an ICE restart.

I see that it is a MUST in the spec (and certainly, we need to talk about spec violation) but what is the technical reason why it will not work on an ICE restart when it works just fine for a normal ICE transport build-up procedure. Where's the difference? Why does the offerer require receiving STUN binding responses prior to the answerer receiving the answer?

taylor-b commented 7 years ago

However, IIRC swapping the DTLS transport (and potentially dropping packets) will just look like packet loss to SCTP which then will ensure reliability (in case of a reliable channel) by repeating those packets as it is a fully blown transport protocol.

That makes sense. And it's how Chrome currently works. But for some reason this behavior is not defined by "mmusic-sctp-sdp":

o NOTE: This specification does not define a mechanism for explicitly closing a DTLS association while maintaining the overlying SCTP association. However, if a DTLS association is closed and replaced with a new DTLS association, as a result of some other action [I-D.ietf-mmusic-dtls-sdp], the state of the SCTP association is not affected.

I'll ask about this on the MMUSIC mailing list. I can't think of why this wouldn't be supported, honestly. Maybe because the only use case of having a new DTLS association is call forking, in which case you need to do a new SCTP handshake? Though I don't know why the answerer couldn't just use a new "sctp-port" in that case, to explicitly cause a new SCTP association.

EDIT: Actually, I guess this is saying "some other spec may define a way of replacing the DTLS association (which dtls-sdp does), in which case SCTP isn't affected". So I guess it's my bad for misinterpreting this, and the "underlying data transport" language in WebRTC 1.0. Though I think the ambiguous spec legalese is partly to blame for my misunderstanding. :)

what is the technical reason why it will not work on an ICE restart when it works just fine for a normal ICE transport build-up procedure. Where's the difference?

In the normal build-up procedure there's no existing DTLS transport, so we don't have this issue of having a new ICE session that could potentially use an existing DTLS association and potentially use a new one.

Why does the offerer require receiving STUN binding responses prior to the answerer receiving the answer?

I think you mean "why is the offerer required to send STUN binding responses prior to the offerer receiving the answer". And the answer is "to make ICE complete faster".

lgrahl commented 7 years ago

It would be more clear to me if the last sentence were

[...] the state of the SCTP association SHALL NOT be affected.

But then I would assume that you agree we need to add a setTransport method to ORTC's RTCSctpTransport. (And I guess RTCQuicTransport will require a setTransport method for changing the underlying RTCIceTransport instance.)

And the answer is "to make ICE complete faster".

Sure, I get that it will be quicker but it will work without STUN binding responses from the answerer prior to it receiving the offer (sorry for the confusion), even if it takes longer, correct? If yes, then your proposal works or am I missing something?

Coming back to a question you raised earlier:

Would it be feasible to allow a role to be passed into RTCIceGatherer, to address this?

Yeah, I think this is a good idea. We could add an optional role to the RTCIceGatherOptions which allows the gatherer to respond to connectivity checks (and resolve role conflicts). When starting the RTCIceTransport instance with that gatherer, we should check that the roles match (not the effective role but the provided roles).

rshpount commented 7 years ago

The intent in draft-ietf-mmusic-sctp-sdp is that DTLS and SCTP associations are independent. Either can be re-initialized independently from each other. Closing and opening DTLS association has no effect SCTP association. Closing the whole m= line (setting port to 0), will close everything, including DTLS and SCTP.

aboba commented 7 years ago

o NOTE: This specification does not define a mechanism for explicitly closing a DTLS association while maintaining the overlying SCTP association. However, if a DTLS association is closed and replaced with a new DTLS association, as a result of some other action [I-D.ietf-mmusic-dtls-sdp], the state of the SCTP association is not affected.

[BA] Since the RTCSctpTransport is constructed from an RTCDtlsTransport and there is no RTCSctpTransport.setTransport method, ORTC currently does not permit closing a DTLS association while maintaining the overlying SCTP association.

taylor-b commented 7 years ago

Sure, I get that it will be quicker but it will work without STUN binding responses from the answerer prior to it receiving the offer (sorry for the confusion), even if it takes longer, correct? If yes, then your proposal works or am I missing something?

Correct. Though I don't want to design something that just "works", I'd want it to also be capable of being optimal and following the standard fully.

The intent in draft-ietf-mmusic-sctp-sdp is that DTLS and SCTP associations are independent.

Thanks for clarifying! I assume that means in the call forking case, the answer would use both a new tls-id and a new sctp-port? But what if the 3PCC entity can't control the SCTP port chosen by the endpoint that ultimately terminates SCTP, and both the new/old endpoint happen to choose the same SCTP port? Is that a potential issue?

lgrahl commented 7 years ago

Closing and opening DTLS association has no effect SCTP association.

Sounds to me as if the SCTP association needs to persist a DTLS transport swap. So, no need to set a new sctp-port? Edit: Ah, call forking, I missed that. Sorry.

ORTC currently does not permit closing a DTLS association while maintaining the overlying SCTP association.

I could provide a PR but I'll test behaviour when swapping the DTLS transport first.

rshpount commented 7 years ago

I assume that means in the call forking case, the answer would use both a new tls-id and a new sctp-port?

Yes, the answer from each end point is expected to use both different tls-id and sctp-port values. Both should be allocated using a high quality random number generator.

But what if the 3PCC entity can't control the SCTP port chosen by the endpoint that ultimately terminates SCTP, and both the new/old endpoint happen to choose the same SCTP port? Is that a potential issue?

Yes, there is potential for sctp-port value collision in cases when 3PCC switches a connection between two different end points. As far as I remember, when this was discussed, it was decided that even in this case, one of the end points will detect sctp-port change and close SCTP connection. Closing SCTP connection without sctp-port value change will cause another offer/answer exchange, which should deal with collision (https://tools.ietf.org/html/draft-ietf-mmusic-sctp-sdp-26#section-9.3).

The probability of both end points accidentally ending up with the same sctp-ports was considered low enough to be acceptable.

lgrahl commented 7 years ago

@rshpount Slightly off-topic but I have a question regarding the sctp-port: Maybe I don't fully understand the scenario but when you talk about collisions, are these just collisions in the SDP (maybe if SDP is using it as some kind of unique identifier) or are these considered as actual port collisions for SCTP? Because as long as only one SCTP association lives on top of a single DTLS association, using the same port is not an issue. And if I understand correctly, forking requires a DTLS association per endpoint...

rshpount commented 7 years ago

The scenario is a setup where end points A and B are connected through a third party call control agent 3PCC. 3PCC requests a new offer from A and sends it to end point C. When creating an answer for offer from A, end point C allocates a new random value for sctp-port, which just happened to be the same value that end point B used before (which is what we mean by sctp-port value collision). End point A gets an answer with the same sctp-port value as before but with new tls-id value. End point A assumes that DTLS association got replaced but that SCTP association is preserved. End point B assumes that these are both new DTLS and SCTP associations. End point B will try to open SCTP association to A, which will fail. End point A should see SCTP association shut down from B, so it will try to reopen SCTP association by initiating a new offer/answer exchange which should clear this up.

taylor-b commented 7 years ago

End point A should see SCTP association shut down from B, so it will try to reopen SCTP association by initiating a new offer/answer exchange which should clear this up.

Ok, that makes sense. But don't be surprised if this ends up as a P3 feature request in the webrtc bug repository for several years. :)

rshpount commented 7 years ago

I think all that is needed here is a call back that SCTP association was closed at the transport level. Application can handle this call back and initiate a new offer/answer exchange. Or implementation can simply set negotiation-needed flag is SCTP association is closed at the transport level.

taylor-b commented 7 years ago

Sure, it's not too hard, but it's also a pretty specific corner case. Also, you've brought up a good question for WebRTC 1.0: how does the application know this happened? An SCTP state change event, or "negotiationneeded"? I'd prefer the former, because there also may be corner cases where the SCTP association is closed for some other reason, and renegotiating won't help. So better to let the application decide, given what it knows about the call topology. Plus, the state change event is already there in ORTC. Anyway, I created a webrtc-pc issue for this (https://github.com/w3c/webrtc-pc/issues/1612).