Revert integration of URL parser for ICE servers from Recommendation text

w3c / webrtc-pc

WebRTC 1.0 API

https://w3c.github.io/webrtc-pc/

Other

439 stars 115 forks source link

Revert integration of URL parser for ICE servers from Recommendation text #2997

Open dontcallmedom opened 2 months ago

dontcallmedom commented 2 months ago

In #2853 (to be completed with #2996) we replaced the algorithm to parse an ICE server url to use the WHATWG URL parser. While that change isn't controversial, it isn't backed by any test. I don't know if there is any implementation, but it doesn't look like libwebrtc implementation does this.

As we're preparing to re-publish the WebRTC Recommendation, and as this particular amendment is meshed within other actually implemented amendments (#2689, #2691), I think we should back out that particular amendment until that republication has happened.

alvestrand commented 2 months ago

It's not clear to me that there is any test possible for the change - I don't think there's an URL that should be considered invalid before and valid after or vice versa. Similarly, it is not clear to me that something that parses according to the URL spec wouldn't parse equally under the libwebrtc code - and "specs specify behavior, not implementation".

Can you think of anything that would validate the change?

dontcallmedom commented 2 months ago

to give an example specific to my recent PR (IIUC): turn:turn.example.org?&transport=udp is accepted by this algorithm, not by the current libwebrtc implementation.

https://mottikumar3.medium.com/url-inter-op-d93689805c21 and https://github.com/w3c/webrtc-pc/issues/2660#issuecomment-1494140703 may be a source of ideas for more generic differences emerging from using the URL parser.

alvestrand commented 2 months ago

is it the ?& ("there's an empty nameless thing here") that doesn't parse?

dontcallmedom commented 2 months ago

If I read the following libwebrtc code correctly, indeed:

  std::vector<absl::string_view> tokens = rtc::split(url, '?');
  absl::string_view uri_without_transport = tokens[0];
  // Let's look into transport= param, if it exists.
  if (tokens.size() == kTurnTransportTokensNum) {  // ?transport= is present.
    std::vector<absl::string_view> transport_tokens =
        rtc::split(tokens[1], '=');
    if (transport_tokens[0] != kTransport) {

(transport_tokens[0] would be "&transport", when kTransport is "transport")

dontcallmedom commented 2 months ago

maybe relevant WPT test data for URL parsing (the stun and turn URLs it contains aren't probably the most relevant since they would not be acceptable in a WebRTC context)

dontcallmedom commented 2 months ago

there is I think a better alternative to reverting the integration of URL parsing: we can leave it as a candidate correction, as long as we factor that particular algorithm out of the set configuration algorithm; I'll prepare a PR in that direction after #2996 is merged.

(this doesn't obviate the need for tests, which I'll take a stab at)

alvestrand commented 2 months ago

seems good to me. The logical place for testing more valid / invalid URIs would be this test:

webrtc/RTCConfiguration-iceServers.html

dontcallmedom commented 2 months ago

Running https://github.com/web-platform-tests/wpt/pull/47959 locally shows quite a number of cases where implementations fail due to not using the expected URL parser (both because they're too lax and too lenient).

It also highlights what I think are likely bugs in the spec:

we don't forbid the username@pasword: construct; this gets rejected in Chromium, which I think is reasonable
nowhere does the algorithm reject something like stun:/example.net or stun:example.net/foo, which I don't see any point in accepting

I'm pretty sure there would be valuable insights in detecting bugs further down the line in how server hostnames are passed to the ICE agent (which we could detect via RTCIceCandidate.url), but since that is currently not explicitly specified and since I'm not sure there is any infrastructure for that kind of ICE test, that's probably best left to later.

dontcallmedom commented 2 months ago

It also highlights what I think are likely bugs in the spec:

* we don't forbid the `username@pasword:` construct; this gets rejected in Chromium, which I think is reasonable

* nowhere does the algorithm reject something like `stun:/example.net` or `stun:example.net/foo`, which I don't see any point in accepting

I've submitted https://github.com/w3c/webrtc-pc/pull/2998 to address these