mozilla / standards-positions

https://mozilla.github.io/standards-positions/
Mozilla Public License 2.0
654 stars 73 forks source link

Network Information API reboot #569

Open tomayac opened 3 years ago

tomayac commented 3 years ago

Hey Mozilla folks,

I have rebooted the Network Information API recently. This is all in a relatively early stage, but I thought now would be a good time to get your feedback on the proposal:

Here is the short version:

// Is the current network a metered network according to the OS-level setting
// in, e.g., Android or Windows, i.e., _without_ the UA guesstimating it. The UA
// may provide its own (override) setting, though:
navigator.connection.metered;
// false

// What is the sustained connection speed, as measured on the OS-level (à la
// `nettop`) for a sliding window and bucketed in buckets of exponentially
// growing size in bit per second, e.g., 25,000,000 (25 Mbit/s), 50,000,000
// (50 Mbit/s). It's fine to report `Infinity` if the user agent doesn't want to
// reveal more, or if the sustained speed isn't known yet.
navigator.connection.metered;
// 50000000

// Changes to either of the attributes are exposed via an event:
navigator.connection.addEventListener("change", (event) => {
  console.log(event);
});

Each of the attributes is accompanied by a client hint header that reflects the attribute:

Sec-CH-Metered-Connection: 1
Sec-CH-Sustained-Speed: 50000000

Thanks in advance for your thoughts, here, or in the motivational document.

Cheers, Tom

ekr commented 3 years ago

I agree it would be nice to have "sustained speed" but actually measuring this is very difficult in any real network, for several reasons:

  1. The actual available capacity depends on what other traffic is sharing your link (this includes other computers, other programs on the computer and other Web sites).
  2. The fraction of the available channel capacity that is being used varies quite a bit depending on the protocols in use due to slow start, congestion avoidance, etc. For instance, I just ran a speed test to speed.cloudflare.com and the 100kB download shows about 80Mbps, whereas the 100MB test shows around 700Mbps
  3. Measuring this via passive measurements of organic traffic as you suggest (e.g., with nettop) is even more problematic because there's no guarantee your device is even using the entire available channel.

It's for these reasons (and others) that real-world protocols like TCP and QUIC constantly adapt to what is apparently the available capacity by measuring packet loss and latency. That's a superior approach to having the client just claim some poorly-defined value.

Tagging @ianswett and @davidschinazi for awareness.

tomayac commented 3 years ago

Thanks for the reply, @ekr. Note that the objective is not to provide an accurate “speed test” (sites can do that themselves if they absolutely need to), but more to provide a rough house number of the recently observed speed.

To make this clearer, one of the use cases is to replace background videos with poster images. Should the API tell the developer that recently the connection allowed video (e.g., by being, say, in the 25 Mbit/s bucket), the site could show a background video and not just a poster image. The video codec will then take care of adaptive streaming based on the actually observed speed in real time.

ekr commented 3 years ago

Thanks for the reply, @ekr. Note that the objective is not to provide an accurate “speed test” (sites can do that themselves if they absolutely need to), but more to provide a rough house number of the recently observed speed.

Yes, but for the reasons I indicate above, this is simply not going to be accurate. If you look at the example I provided, which actually is a speed test, there is nearly an order of magnitude difference depending only on the size of the file (due to slow start, presumably).

To make this clearer, one of the use cases is to replace background videos with poster images. Should the API tell the developer that recently the connection allowed video (e.g., by being, say, in the 25 Mbit/s bucket), the site could show a background video and not just a poster image. The video codec will then take care of adaptive streaming based on the actually observed speed in real time.

Yes, I understand why you might want this functionality, but that doesn't make it any more technically feasible. In particular, this is very likely to chronically underestimate (because most clients don't use the entire channel all the time) and therefore will cause the server to provide a less rich experience than it otherwise could. It's quite likely that the server would be better off just directly measuring the download time for its own content (especially if it's using QUIC and can introspect into the connection state).

DavidSchinazi commented 3 years ago

In my experience, systems built on "did the network recently have property X?" never work well. They always underperform compared to a system that tries and measures. Based on this, I'm not sure this new API adds value. When you add in the privacy implications, the value might become negative.

tomayac commented 3 years ago

Thanks for the additional replies, @ekr and @DavidSchinazi!

One way to read my proposal would be that it moves the closed-ended, label-based effectiveType system (minimum slow-2g to maximum 4g) to an open-ended, numbers-based sustainedSpeed system, where, as a developer, you get future-proof, technology-independent speed buckets like 50 Mbit/s, 100 Mbit/s,… as house numbers to tailor the user experience to.

Judging from the ChromeStatus usage numbers that show that the current API is encountered on >40%(!) of all page loads, the way effectiveType is currently implemented in Chromium has apparently been meaningful enough for businesses like Tinder's adaptive loading, Facebook's Messenger Chat widget, YouTube's embed player, or the prefetching plugin WP Browser Caching to base their implementations upon (and many more), apart from analytics companies that report on these numbers like Panelbear or the customer analytics of Wix. I am confident the present approach to determine the speed meaningfully can be adapted to the proposed new system.

Regarding the privacy implications, the proposal actually reduces the fingerprintable surface compared to the current API. Note that the specifically negatively highlighted connectedType has been axed. The proposal aims to unlock the use cases of the current API in a privacy-sensitive way that vendors like Mozilla can commit to.

ekr commented 3 years ago

I think we're starting to repeat ourselves here.

  1. I don't think it's at all obvious that just because something is on a lot of pages it's meaningful as a performance metric. We see a lot of fingerprinting.
  2. It's not in fact clear to me that this kind of direct speed measurement is more meaningful than one which measures what kind of network one is actually on, for the reasons we indicated.

In any case, I'm skeptical that this provides accurate data for the reasons I indicated.

If you believe it does, then I think it's your responsibility to demonstrate that via some measurements. Some possibilities here would be:

  1. Compare this value against a measured top speed (e.g., show a high r^2)
  2. Show that this value produces a measurable improvement in some other metric.
tomayac commented 3 years ago
  1. I don't think it's at all obvious that just because something is on a lot of pages it's meaningful as a performance metric. We see a lot of fingerprinting.

There is a fingerprinting vector to the current API for sure, albeit there are cross-origin tracking mitigations in place.

When it comes to non-tracking, non-analytics use cases (apart from what's listed in the comment above), the popular open-source project Shaka Player uses the API to adjust the initial playback rate.

Developers of the social networking site Facebook have gone on the record to state that through this API is how they realize their adaptive loading use case.

  1. It's not in fact clear to me that this kind of direct speed measurement is more meaningful than one which measures what kind of network one is actually on, for the reasons we indicated.

In any case, I'm skeptical that this provides accurate data for the reasons I indicated.

If you believe it does, then I think it's your responsibility to demonstrate that via some measurements. Some possibilities here would be:

  1. Compare this value against a measured top speed (e.g., show a high r^2)
  2. Show that this value produces a measurable improvement in some other metric.

@tarunban has offered to look into what data we could provide. We might be able to provide UMA data on the RTT distribution on WiFi networks for example.

tomayac commented 3 years ago

(Just to add: the analytics use case and the adaptive loading use case go hand-in-hand, as outlined in this brilliant post by the search company Algolia. I just verified that the described logic is actually in use for supporting browsers on hn.algolia.com.)

ekr commented 3 years ago

Thanks for agreeing to try to get some data. We'll await that.

tarunban commented 3 years ago

I think the data I wanted to share was mostly around variability that we observe from Chrome in the quality of different networks even though all of them have the same last hop of WiFi. We measured the TCP RTT (Time taken to successfully establish a TCP connection to an individual endpoint) across users to different endpoints, and here is the percentile distribution in milliseconds: 25p: 30 50p: 67 75p: 177 95p: 642

This data is fairly intutive but it's just meant to show that the type of the network's last hop is not sufficient enough to determine quality of the network.

ekr commented 3 years ago

I don't think that this tells us much.

I certainly agree that the last hop is not sufficient to determine the quality of the network, but I don't think that that addresses the point that this API is likely to give highly unreliable information.

Incidentally, RTT doesn't necessarily tell you very much about available bandwidth, especially if you are dealing with people with high bandwidth-delay product networks (as can occur in, for instance, Australia).

The most natural experiment here is to measure:

  1. The result of this API
  2. The result of a direct speed test

And then look at the correlation between them.

tomayac commented 3 years ago
  1. The result of this API
  2. The result of a direct speed test

Thanks, @ekr. @tarunban, do you think we can provide anything along these lines, maybe based on an un-capped at 4G but open-ended network quality estimator variant?

ianswett commented 3 years ago

I don't think that this tells us much.

I certainly agree that the last hop is not sufficient to determine the quality of the network, but I don't think that that addresses the point that this API is likely to give highly unreliable information.

Incidentally, RTT doesn't necessarily tell you very much about available bandwidth, especially if you are dealing with people with high bandwidth-delay product networks (as can occur in, for instance, Australia).

In theory, I agree with this. But in practice, they're quite well correlated.

The most natural experiment here is to measure:

  1. The result of this API
  2. The result of a direct speed test

And then look at the correlation between them.

ekr commented 3 years ago

In theory, I agree with this. But in practice, they're quite well correlated.

I should have mentioned one more thing: is this the distribution of individual measurements or the distribution of means or something? Because you'd expect quite a bit of distribution of individual measurements even from a single device, due to the paths to different locations, occasional packet loss, etc.

tarunban commented 3 years ago

2. The result of a direct speed test

Thanks, @ekr. @tarunban, do you think we can provide anything along these lines, maybe based on an un-capped at 4G but open-ended network quality estimator variant?

I think it's a bit challenging to run a direct speed test in the browser because they generally use too much data (problematic on metered connections) and also by definition, they saturate the network. This means running the speed test will likely slow down tasks that user is trying to achieve.

Instead of bandwidth speed test, we could run RTT speed test which use less data and does not saturate the network. That's feasible but that's already very close to what the Chromium open-ended network quality estimator implementation does: It observes the RTT to different end-points, takes a weighted average (higher weight to more recent samples) and returns the weighted average value.

ekr commented 3 years ago

I think it's a bit challenging to run a direct speed test in the browser because they generally use too much data (problematic on metered connections) and also by definition, they saturate the network. This means running the speed test will likely slow down tasks that user is trying to achieve.

I'm not suggesting that you do this generally. I'm suggesting that you run an experiment which measures both maximum attainable speed and the metric you are proposing (maximum recent consumed bandwidth) and demonstrate that they are correlated. This could be run on a relatively small fraction of the users a single time and then you'd have the data.

Instead of bandwidth speed test, we could run RTT speed test which use less data and does not saturate the network. That's feasible but that's already very close to what the Chromium open-ended network quality estimator implementation does: It observes the RTT to different end-points, takes a weighted average (higher weight to more recent samples) and returns the weighted average value.

As I observed previously, RTT and bandwidth are different quantities (though, as Ian suggests, they are in practice not unrelated). However, to the extent to which you think RTT is a proxy for effective path bandwidth, then that suggests this API is unnecessary: the server can measure the RTT on the connection directly by measuring connection establishment time (this is a bit of a pain in TCP but can be done by modifying the kernel, and is straightforward in QUIC) and use that to estimate bandwidth, without any information from the client at all. This measurement of course suffers from some initial noise, but has several advantages (1) it measures the performance of this path, not of random other paths the client may be using and (2) it measures the current conditions, rather than past conditions.

Taking a step back: the question at hand here is whether the client is in possession of better information about its network environment than the server can measure directly. What I'm asking you to do is to provide a set of measurements that indicate that that's the case.

ianswett commented 3 years ago

I've seen papers which demonstrate this correlation, but I can't seem to find any right now.

For Reno/Cubic/etc style TCP, the relationship is a direct result of the congestion controller(aka the Mathis equation), for example see: https://netbeez.net/blog/packet-loss-round-trip-time-tcp/

I'm not sure anyone has done a study of this for BBR, but if we're primarily interested in server to client bandwidth, then that should be an analysis I can do using existing server data. Client to server data is also possible, but we don't have nearly as many data points.

One caveat for TCP is that if there's a PEP in the way, RTT could be very small(ie: a few ms) but the actual RTT is very large(ie: Satellite). I might be able to break down by a few different RTTs(ie: SynAck RTT, MinRTT, STT) if they're available. For QUIC, MinRTT and SRTT should be sufficient.

valenting commented 1 year ago

In the interest of moving forward with this issue I will add a few thoughts:

tomayac commented 1 year ago
  • I like this new proposal a lot more than the old network information API.

That's good to hear!

  • As noted above, the sustainedSpeed attribute would be both tricky to compute and not be very useful to the webpage since this number doesn't really mean much.

It's meant to be useful enough to tell you what speed the browser has observed sustainably over the last of a number of sliding windows. Think of it similar to OS widgets that tell you the overall network speed. In user research linked in the proposal, this was considered useful.

If the UA were to report this info when the upload and download speeds differ, I would expect to report the smaller of the two, but the spec doesn't say that.

That's a good point. I was mostly thinking of reporting the downlink speed, since this is what matters more in the majority of cases.

I do like that the proposal states that User agents with a special focus on privacy can report the sustained connection speed as Infinity.

This was added to the spec so browser vendors could still correctly implement it, but at the same time not expose the actual data.

Where I do see this as being useful is if we want a way to signal to an origin/webpage that the user wants the page to not use a lot of data - eg. for Youtube to give a low quality video, or for Zoom to capture a low quality video stream instead.

These were some of the use cases in mind indeed.

  • I think the metered boolean would actually be useful in practice. It adds only one more bit of entropy, but that bit can be used to determine if the user is at home/office/airport. That seems a bit sensitive indeed. These are concerns that were also brought up on the webkit thread

On the other side, this removes the information from the old API, so overall there'd be less data.

  • If we were to ship that wouldn't happen without a piece of UI allowing the user to configure whether to report the actual metered attribute of the network interface, or always true/false.

This sounds perfectly reasonable to me.

  • I don't think the Client Hints actually bring a lot of benefit. It places a lot of burden on the client to keep track of which origins got an Accept-CH: Sec-CH-Metered-Connection and then send the headers to the next requests on those origins.

This would help with the very first request, so the server can from the start tailor the experience. This is especially useful for high-traffic sites.

It seems like the page could simply check the JS API and set a cookie instead - granted this wouldn't change automatically when the metered attribute changes. In any case, I don't like this part of the proposal.

The main idea is that this bit would change frequently enough. Just as an example: on a plane you'd be metered as you only got the 100MB WiFi pack, when you enter the terminal you'd be unmetered as you're connected to the airport WiFi, and then on the train into town again you'd be metered, as you're roaming. The setting doesn't really correlate with a trackable location.

valenting commented 1 year ago
  • I think the metered boolean would actually be useful in practice. It adds only one more bit of entropy, but that bit can be used to determine if the user is at home/office/airport. That seems a bit sensitive indeed. These are concerns that were also brought up on the webkit thread

On the other side, this removes the information from the old API, so overall there'd be less data.

Gecko and WebKit don't implement the old API, so this would still be problematic 🙂

  • I don't think the Client Hints actually bring a lot of benefit. It places a lot of burden on the client to keep track of which origins got an Accept-CH: Sec-CH-Metered-Connection and then send the headers to the next requests on those origins.

This would help with the very first request, so the server can from the start tailor the experience. This is especially useful for high-traffic sites.

It is my understanding that the Server first has to respond with Accept-CH: Sec-CH-Metered-Connection, Sec-CH-Sustained-Speed for Sec-CH-Metered-Connection to be sent on later requests, right? In any case, if the user has already loaded the main page to example.com, the following requests will be triggered by that page, so theoretically the page could check the .metered attribute in JS and not load or load a different resource. I agree that the value could change often, but I don't see the HTTP header providing much benefit.

tomayac commented 1 year ago

Gecko and WebKit don't implement the old API, so this would still be problematic 🙂

Hah, that's fair 👍.

It is my understanding that the Server first has to respond with Accept-CH: Sec-CH-Metered-Connection, Sec-CH-Sustained-Speed for Sec-CH-Metered-Connection to be sent on later requests, right? In any case, if the user has already loaded the main page to example.com, the following requests will be triggered by that page, so theoretically the page could check the .metered attribute in JS and not load or load a different resource. I agree that the value could change often, but I don't see the HTTP header providing much benefit.

The secret is Critical-CH, see How it works for an example.