mozilla / standards-positions

https://mozilla.github.io/standards-positions/
Mozilla Public License 2.0
650 stars 71 forks source link

Network Information API #117

Closed martinthomson closed 4 years ago

martinthomson commented 5 years ago

Request for Mozilla Position on an Emerging Web Specification

Other information

This API ships in Chrome. Firefox had an implementation of the API, which has subsequently been removed. This dev.platform thread from 2016 covers this. Some of that discussion is might be stale as the spec has changed a little in the meantime.

Of note, since those discussions, the Save-Data feature was moved to this specification. This is a signal to a site that the client prefers a trade-off in favour of network efficiency when there are choices to be made between the number of bytes transferred and other things (like CPU overheads, quality, etc...)

dbaron commented 5 years ago

So what position are you recommending here? It sounds like the previous discussion was somewhere between defer and harmful... but does the addition of Save-Data change that?

martinthomson commented 5 years ago

Sorry, forgot to enter my recommendation.

I think that this is a harmful, though I'm open to arguments to the contrary. The privacy and fingerprinting properties of this aren't superb, but could be justified if the API was really good. It's not very good. Both @bzbarsky and I had concerns about the usefulness of the API and it really hasn't changed. Those concerns extend to Save-Data, which is more of the same, except at a slightly higher cost because it is an HTTP header field.

marcoscaceres commented 5 years ago

Agree that the current design is harmful for the reason that we’ve discussed previously. The current spec deviates wildly from the original use cases I’d researched (which continue to be demonstratively valid, IMO): https://www.w3.org/TR/netinfo-usecases/

marcoscaceres commented 4 years ago

The W3C's DAP WG is looking to adopt the Netinfo API from the WICG. I've left some feedback that we are likely to object unless they reduce the scope:

https://github.com/w3c/dap-charter/issues/78#issuecomment-596300407

We should apply some out of the box thinking to addressing some of the use cases - for example, allowing users to tell the browser not to download resources over X megabytes without asking when on a metered connection.

annevk commented 4 years ago

Wouldn't that require something completely different? (As far as I can tell it would effectively require the user to cooperate to express that information as there's no way of knowing if something is metered or not.)

marcoscaceres commented 4 years ago

Windows lets users set a connection as metered. And I think MacOS can distinguish between regular wifi and using a personal hot spot (so in theory...).

metered connection on windows
marcoscaceres commented 4 years ago

iOS also allows users to indicate that a particular network is in "low data mode":

IMG_0783

"Low data mode helps apps on your iPhone reduce their network data usage".

Cellular networks can also be marked as low data.

martinthomson commented 4 years ago

I am going to suggest that we go with harmful again, realizing that this is a hard decision for the one piece that I think we care about, which is "bandwidth costs money".

Measuring the Network

There seems to have been a serious effort put in to create a taxonomy of access network technologies and to quantify each. That isn't really helpful as the characteristics of each are dynamic, both in terms of what any given installation looks like and in terms of the minute-to-minute performance properties they might exhibit. A cellular network with the "4g" label covers a bunch of performance points that include different underlying access technology, such as radios with different frequencies and maximum theoretical throughput. It also has to deal with a range of differences in deployments like different network loading (the number of people using the network), and other characteristics like environment, rain, backhaul capacity, concurrent usage, etc...

For the qualities that are measured, there are some bad assumptions in the spec. Generally this is the result of assuming that performance on all paths through the network to be roughly equal. Many of the measurements are based on what performance has been seen from other servers. Those servers will be in different network locations, with sometimes vastly different performance characteristics.

For that reason, none of the measurements have any true correspondence to something that the current server can act on, but it will expose information about what the client has been doing recently. downlinkMax and type also expose information that a site cannot measure.

For these measurements, there are also HTTP header fields defined that don't use the established client hints framework, but probably should (that is, if you accept that they are useful). Save-Data does use client hints, so this is a little confusing.

Overall, all of these are better handled adaptively by sites, measuring the behaviour of an active connection instead. For instance, you get a pretty good read on RTT during connection establishment. That might not be as accessible, but measuring the timing for a simple request to your server is not that hard if the measurement is very important to you.

Saving Data

This is a harder proposition to address. It is obvious that there is value in having modes that use less data. Getting lower-quality images or less proactive fetching is a good thing for people who want to save bits (or batteries).

How that manifests on the web is something that I think needs a lot more thought. Generally, we have preferred to use client-based approaches for that sort of thing, which avoids leaking distinguishing information about the client in every HTTP request. So things like the picture element come in to play here.

I think that we might be moved to say that allowing the server to make this call is fine, so maybe that one piece could be not-harmful in the same way that client hints (#79) ended up being, noting that this was borderline harmful also.

(Separately, I think that Microsoft use bad terminology for that feature. I wouldn't classify "metered" connections as being the same as data reduction features on mobile devices, though it is clear that this is their intent.)

yoavweiss commented 4 years ago

Hey Mozillians!! :)

Thanks for your thoughtful position. I'd like to discuss the points raised.

But first, looking at the current state of the spec, I'd like to apologize for its lack of maintenance. https://github.com/WICG/netinfo/pull/83 attempts to fix some of that and bring it up to date when it comes to both the removal of Save-Data from the spec (it has moved to its own specification) and properly referencing Client Hints and Structured Headers when defining the request headers.


There seems to have been a serious effort put in to create a taxonomy of access network technologies and to quantify each. That isn't really helpful as the characteristics of each are dynamic, both in terms of what any given installation looks like and in terms of the minute-to-minute performance properties they might exhibit. A cellular network with the "4g" label covers a bunch of performance points that include different underlying access technology, such as radios with different frequencies and maximum theoretical throughput. It also has to deal with a range of differences in deployments like different network loading (the number of people using the network), and other characteristics like environment, rain, backhaul capacity, concurrent usage, etc...

I don't disagree with that statement. I believe type and DownlinkMax were mostly exposed because there was no alternative better signal at the time. I think effectiveConnectionType is a far superior signal, as it's based on actual measurements of the network, rather than static characteristics.

For the qualities that are measured, there are some bad assumptions in the spec. Generally this is the result of assuming that performance on all paths through the network to be roughly equal. Many of the measurements are based on what performance has been seen from other servers. Those servers will be in different network locations, with sometimes vastly different performance characteristics.

For that reason, none of the measurements have any true correspondence to something that the current server can act on, but it will expose information about what the client has been doing recently.

While true that different servers can be in different locations and therefore have different characteristics, I believe the underlying assumption is that the user's last-mile network is likely to be a bottleneck in their interactions with past servers, which can be used to "predict" their likely network performance when interacting with a new servers. The coarse buckets into which ECT puts those network conditions may help to ensure that.

@tarunban can comment more on the ECT implementation and if we tried to limit it to be more "origin based".

downlinkMax and type also expose information that a site cannot measure.

True. Given the fact that I believe their usefulness is not high, I'm open to removing them.

For these measurements, there are also HTTP header fields defined that don't use the established client hints framework, but probably should (that is, if you accept that they are useful). Save-Data does use client hints, so this is a little confusing.

That's an omission, thanks for pointing it out! I filed https://github.com/WICG/netinfo/pull/83 to fix that.

Overall, all of these are better handled adaptively by sites, measuring the behaviour of an active connection instead

There are a few problems with that approach:

For instance, you get a pretty good read on RTT during connection establishment.

RTT can indeed be read through other (passive) means.

Saving Data

This is a harder proposition to address. It is obvious that there is value in having modes that use less data. Getting lower-quality images or less proactive fetching is a good thing for people who want to save bits (or batteries).

How that manifests on the web is something that I think needs a lot more thought. Generally, we have preferred to use client-based approaches for that sort of thing, which avoids leaking distinguishing information about the client in every HTTP request. So things like the picture element come in to play here.

I'd argue that if we expose only coarse information (e.g. effectiveConnectionType), then network-based srcset decisions expose similar information. Servers can observe if the browser requested one resource or the other, and could deduce that network conditions were what lead to that decision.

As far as the use-cases go, srcset can solve cases where you want a lower resolution version of an image, but won't help developers that want to send entirely different experiences (e.g. video intro to users who "can afford it" vs. images and text to users who cannot).


Let me conclude with a concrete proposal:

I still need to run this by other folks, and make sure it's web compatible, but from my perspective, I'm willing to strip down this proposal to its basic core value, which IMO is effectiveConnectionType. That would mean the removal of type, downlinkMax, downlink and rtt. They seem to expose a lot of information without necessarily providing user value that justifies it. I'm also open to exploring the reintroduction of the metered value, if we can figure out a way in which browsers can reliably know that (so, without assuming that "cellular" == "expensive").

Would such a change result in a different outcome from your perspective? Can we work together and iterate over effectiveConnectionType and metered to make sure they provide user value in a safe way?

marcoscaceres commented 4 years ago

without assuming that "cellular" == "expensive"

Browsers don't need to assume it, users control it: https://github.com/mozilla/standards-positions/issues/117#issuecomment-596392117

(same switch on Windows)

Trying to be smart about it is just going to end in sadness - so let's not waste even pretending to guess it.

yoavweiss commented 4 years ago

Browsers don't need to assume it, users control it: #117 (comment)

(same switch on Windows)

Android also has a similar switch. Assuming that actually works (e.g. that users are actually setting this appropriately), that solves the "wifi == cheap" assumption. It doesn't necessarily solve the "cellular == expensive" one (but maybe browsers can ask users that question in a meaningful way, use data limits users set, etc).

Let's further explore this at https://github.com/WICG/netinfo/issues/84

annevk commented 4 years ago

It'd probably better to create a new issue for a Revised Network Information API proposal once there's something concrete to evaluate than discuss possible changes here.

tarunban commented 4 years ago

While true that different servers can be in different locations and therefore have different characteristics, I believe the underlying assumption is that the user's last-mile network is likely to be a bottleneck in their interactions with past servers, which can be used to "predict" their likely network performance when interacting with a new servers. The coarse buckets into which ECT puts those network conditions may help to ensure that.

@tarunban can comment more on the ECT implementation and if we tried to limit it to be more "origin based".

It is true that servers can measure this information from their ends (how fast am I pushing the data). It's also true that user's connection to different servers may have different characteristics.

However, one of the use cases of the API is to provide server's with information when they do not have an initial estimate. e.g., some video players use the API's estimate to seed the initial bitrate for videos and then use their own algorithms to adapt the bit rate. Other case is where web developers use this API to rewrite the content of the html page. In that case, relying on server-measured RTTs leads to more inaccuracies (e.g., due to TCP middleboxes).

We've not tried to limit it to origin based. Even in the simplest case, a webpage may be hosted at multiple servers. In the example above, the media content might be hosted on a CDN while the website may be hosted somewhere else. Getting more origin specific data may get us more accurate data in some cases but it has not been a priority so far.

tomayac commented 3 years ago

I have rebooted the Network Information API based on long-going discussions with @yoavweiss (who is currently still OoO) and would appreciate you all's feedback:

paco-sparta commented 4 months ago

The spec draft link above points to a malicious site.

jesup commented 4 months ago

I think it's here: https://tomayac.github.io/netinfo/

tomayac commented 4 months ago

The spec draft link above points to a malicious site.

https://github.com/mozilla/standards-positions/issues/117#issuecomment-898389422 updated with the correct link. Thanks for flagging!