mozilla / standards-positions

https://mozilla.github.io/standards-positions/
Mozilla Public License 2.0
634 stars 69 forks source link

Network Error Logging (NEL) #99

Open digitarald opened 6 years ago

digitarald commented 6 years ago

Request for Mozilla Position on an Emerging Web Specification

dbaron commented 6 years ago

cc @mcmanus for his thoughts

mcmanus commented 6 years ago

I looked at this a couple years ago - I had a few minor concerns about what was being reported, interactions with cors, etc.. I would have to spend some time looking them up (and can do so) They are likely addressable, the overall the idea is ok - the major value is it will give folks ways to monitor the roll out of more advanced features and therefore reduce risk and incentivize the deployment.

but the bigger concern at the time was that there was very little interest in deploying this server side other than at its sponsor google. has that changed? without wide interest its not going to incentivize deployment but will add complexity to the ecosystem.

dcreager commented 6 years ago

Hiya, I'm the editor of the relevant specs, and I'm happy to address any questions or concerns you might have (here or over in the spec repos).

One important point to clarify is that we factored out the report delivery portion into a separate Reporting spec (repo). Network Error Logging (repo) now only covers defining network errors (and successes) and how they map to report payloads. (Not sure if that happened after you most recently took a look.)

Report uploads should be hooked into CORS correctly (they're subject to preflights if the collector is at a different origin). The spec defines client-side failover and load balancing à la DNS SRV records. There's also now a JavaScript observer API for getting script-side access to reports — although NEL reports are explicitly excluded from being observable, to prevent leaking sensitive network reliability information.

On the NEL side, we've done some work in the last couple of months to tighten up the security and privacy constraints — for instance, by preventing DNS rebinding and subdomain policy attacks. (See w3c/network-error-logging#74 for the gory details.)

Reporting is also going to be used to deliver other types of reports than NEL — CSP is adding support for it in its next revision, and there are some predefined browser events (deprecations, interventions, crash reports) that we're defining in Reporting itself.

We have received out-of-band signals from some external developers who are ready to try this out once it goes live in Chrome, though I don't have any hard numbers on that. We're trying to minimize the effort required to adopt Reporting and NEL by having an open-source reference collector implementation available.

Hopefully this addresses some of your questions; let me know if there's anything else you want to dive into!

ScottHelme commented 5 years ago

It'd be awesome to see Firefox support NEL, and by extension the Reporting API.

We've added support at https://report-uri.com so hopefully that will allow site operators to enable this feature more easily without having to build their own reporting endpoint: https://scotthelme.co.uk/introducing-the-reporting-api-nel-other-major-changes-to-report-uri/

If adoption is a concern then perhaps this will give it a bump.

dbaron commented 5 years ago

Also worth noting there's an explainer.

dbaron commented 5 years ago

cc @ddragana @bzbarsky @martinthomson as well, for their thoughts.

The idea seems reasonable to me -- obviously some of the value depends on how widely deployed it ends up being, as @mcmanus noted above.

(And if it does become widely deployed, it's clearly advantageous for a browser to implement it because then their users are likely to get better experiences whenever any of the errors are specific to some browsers but not others.)

My initial reaction is that it seems to fit within the worth prototyping category in https://mozilla.github.io/standards-positions/ -- this presumes that (a) it seems likely to be useful and (b) there doesn't appear to be anything harmful about it.

annevk commented 5 years ago

There's a few things here that need more consideration I think:

  1. It relies on the Reporting API, discussed in #104, which provides cookie-like tracking capabilities. https://w3c.github.io/network-error-logging/#privacy-considerations does go into this, but the mitigation of requiring secure contexts does not seem effective.
  2. It exposes network errors we've been historically uncomfortable with to expose to JS. It's not entirely clear to me how this changes those tradeoffs.
dcreager commented 5 years ago

For (2), NEL doesn't expose any new errors to JavaScript. The spec calls out that NEL reports are not visible to ReportingObservers.

Instead, NEL reports are only uploaded to the collectors defined by the owner of the recipient of a request. If the originator is different, they don't get to see any NEL reports about the success or failure of the outbound request.

annevk commented 4 years ago

To be clear, the concern is not that it exposes new errors to JavaScript, it's that it exposes new errors.

I think our stance for this should be harmful. While I think we should be supportive of reporting things that are already otherwise exposed to improve developer ergonomics, using reports for information that is not otherwise known is a lot harder to justify. Additionally, while reporting in general is now per-document, NEL is not and still has the cache problem.

(There's also the problem that none of the network errors are specified in terms of the low-level primitives defined in Fetch.)

dcreager commented 4 years ago

To be clear, the concern is not that it exposes new errors to JavaScript, it's that it exposes new errors.

Exposed to whom? It's not just that the errors aren't exposed to JavaScript — they're not exposed to the originator of a request through any means. The errors are only exposed to the recipient of the request, who would see the same information in their server logs for successful requests, and even for failed requests that make it past a certain point in the connection establishment process.

using reports for information that is not otherwise known is a lot harder to justify

I completely agree with this. We've tried to be very careful to not expose new information, and not expose anything to unauthorized parties. These are the principles we followed when designing NEL (from a paper we presented at NSDI back in February):

  1. We cannot collect any information about end users, their device/user agent, or their network configuration, that the server does not already have visibility into. That is, we should not collect new information relative to existing server logs; only existing information in a different place.
  2. We can only collect information about requests that user agents issue when users voluntarily access services on the Web. We cannot issue requests in the background (i.e., outside of normal user activity), even though this prevents us from proactively ascertaining service reachability.
  3. End users can opt out of collection at any time, either globally or on a per-site basis. Support for respecting opt-outs must be implemented by NEL-compliant user agents, so that users do not need to trust service providers for opt-outs to take effect.
  4. Modulo that end-user opt-out, it is only the site owner who gets to decide whether reports are collected about a particular site, and if so, where they are sent. Third parties (including browser vendors) must not be able to use NEL to monitor sites that they do not control.

Is your concern that something like NEL would be harmful even if it followed these principles? Or that NEL as currently designed doesn't follow them?

annevk commented 4 years ago

To be clear, I understand it's all "same origin".

It's not clear to me how NEL follows those principles. E.g., how would example.com know I cannot get to their DNS records? How could example.com identify a specific user from errors in their server logs? How would they know an IP address is invalid?

martinthomson commented 3 years ago

I'd like to lay this one to bed, but there seems to be a thicket of issues to resolve first.

As far as @annevk's concern about exposing new information goes, I'd like to resolve that. Two things might help here:

The potential abuse as a supercookie seems to have been resolved with the reporting API, so it would be good to confirm that the same applies here. Of course, the spec hasn't tracked reporting API changes, so it is unclear.

Those are the important items, based on the conversation.

Understanding adoption status (by sites other than Google properties as noted) would be good.

I also have a bunch of concerns about the specification itself. This hasn't tracked changes in the Reporting API and there has been no real activity on the spec in almost 2 years. So it seems like it might have been neglected a little. For example, the NEL header field is defined using defunct syntax (see RFC 8941) and it hasn't been registered in the appropriate place.

@tantek was looking to resolve this as 'harmful', which I think is fair given the conversation so far and the current state of the specification. However, good answers to the above might change that disposition.

joseba4242 commented 2 years ago

login.microsoftonline.com uses NEL.

paulmillar commented 2 years ago

Just to mention it, CloudFlare appears to be using NEL.

cdanis commented 2 years ago

Reporting from the Wikimedia Foundation, the non-profit that maintains Wikipedia and other related projects: we use NEL, and it has been really important for detecting outages that otherwise would have either been missed or only caught due to manual user reports.

ScottHelme commented 2 years ago

Understanding adoption status (by sites other than Google properties as noted) would be good.

On 5th Jun 2022 there were 177,229 sites [1] serving a NEL header in the Top 1 Million Sites (list provided by Tranco [2]), which indicates that almost 18% of sites are using NEL.

At Report URI [3], we process a little over 5,000,000 NEL reports per day, with none of them coming from Google owned properties or from Cloudflare managed properties. There are also other reporting platforms out there capable of ingesting NEL reports for websites for which I don't have any data to reference publicly.

All in all, I think there's a reasonably large collection of sites out there that use NEL already and my data shows that the number of sites using it is steadily increasing.

[1] https://crawler.ninja/files/nel-sites.txt [2] https://tranco-list.eu [3] https://report-uri.com

polcak commented 1 year ago

We are doing research on NEL.

First of all, we have analyzed HTTP Archive data on NEL deployment. The deployment raised from 0 to 11.73 % (almost 2,250,000 unique domains) since 2019. Current deployment is dominated by Cloudflare. This paper is not yet submitted and is a work in progress.

Second of all, we have focused on data protection and security issues with Network Error Logging and have an accepted paper for SECRYPT'23. Our conclusion are:

We recommend:

Please read the paper for more details. Do not hesitate to contact us for more information.

Edit: removed duplicate sentence fragment.

mozfreddyb commented 1 year ago

This seems to be indicative that some of the issues we saw earlier are still unresolved. It's unfortunate that there was apparently not enough spec work to addresse these concerns.

@polcak Thank you for sharing this paper and your research with us.

If we want to get this triaged, I suggest we label this negative. Seems overdue.

simon-friedberger commented 11 months ago

Many of the previously concerning issues have been addressed in the spec. I expect that there will be further spec changes related to the discussions at TPAC as mentioned e.g. in 105.

Assuming that privacy issues are sufficiently addressed this can be reconsidered. Some conditions would be:

  1. Sufficient flexibility to omit parts of reports or entire reports. It must be possible for clients to offer appropriate privacy controls. This is already in the spec
  2. Good indication about which data is useful for what. "request_headers" and success reports seem to have a bad cost/benefit ratio and there is not enough information available. See, e.g. 133. This is especially important for opt-out data collection.
  3. Usage of privacy preserving data collection mechanisms like OHTTP or PPM where necessary.
mozfreddyb commented 11 months ago

Agreed. Happy for us to revisit, if our original concerns are going to be resolved.

SulemanAhmadd commented 10 months ago

Following our (Cloudflare) discussions with Mozilla on the topic of client-side error reporting, we have compiled the following document. It aims to provide insights into the use-cases of NEL and privacy delta for each error report field consumed by Cloudflare (while keeping operational usability in mind): [SHARED] Cloudflare: NEL Usage Analysis.

Important things to note:

We hope the above document will be useful for Mozilla for reevaluating the deployment of client-side connection error logging. The hope is that it will help in understanding data exposure to what is required for operational usability based on real-world deployment. I personally believe the takeaways align really well with @simon-friedberger points above (to which the above document attempts to provide further guidance).