w3c / network-error-logging

Network Error Logging
https://w3c.github.io/network-error-logging/
Other
81 stars 18 forks source link

Verify "ownership" when generating include-subdomain reports #74

Closed dcreager closed 6 years ago

dcreager commented 6 years ago

First reported on blink-dev. We should add language to the spec to prevent NEL reporters from sending reports about any connections to a private address. Without that, someone could use an include-subdomains configuration to have clients probe their internal RFC1918-addressed networks, and report those results back to an external collector.

sleevi commented 6 years ago

RFC 1918 may be one scope, but more broadly, this is related to bypassing the origin-based model (via includeSubDomains). If this was scoped to an origin, this would not be an issue either.

If we're going to expand past an origin, is there an opportunity for a CORS-like consent model? Are there other places where the lack of following the Same Origin Policy going to bite us?

domenic commented 6 years ago

Is this different from how you can use the load and error events on image/iframe/script for the same purpose?

(Edit: to be clear, origin-scoping for new features is a Good Thing, even if old features are improperly secure. I'm just a bit curious about the actual attack, as the blink-dev post seems very similar to what's already possible. If the answer is, "they're the same", then we should probably still restrict!)

flano-yuki commented 6 years ago

I suppose NEL reports have more information than load and error events. tcp.refused, tcp.closed, tcp.reset and tcp.address_unreachable have different mean.

dcreager commented 6 years ago

Transcribing the example from the blink-dev thread:

  1. Alice owns example.com, and configures her DNS servers to return private IP addresses for one or more subdomains (e.g., 127-0-0-1.example.com → 127.0.0.1).

  2. Alice hosts a web page at https://example.com/ (using a perfectly valid signed certificate) that contains img links to this subdomain:

    <img src="https://127-0-0-1.example.com:2001" />
    <img src="https://127-0-0-1.example.com:2002" />
    <img src="https://127-0-0-1.example.com:2003" />

    This page also includes Reporting/NEL response headers with an include-subdomains configuration:

    NEL: {"report-to": "nel", "include-subdomains": true,
          "success-fraction": 1.0, "failure-fraction": 1.0}
  3. Alice convinces Bob to access this web page. Bob's user agent will follow the img links, and its NEL stack will report back to Alice's NEL collector about whether or not there's anything listening on those 127.0.0.1 ports.

End result: Alice has been able to perform a port scan of Bob's internal network by convincing him to access a web page.

dcreager commented 6 years ago

Is this different from how you can use the load and error events on image/iframe/script for the same purpose?

As @flano-yuki points out, if you don't care about the differences between e.g. "connection refused" and "connection reset", then it's equivalent to what you can do with load/error events — with the caveat that the NEL stack in the browser would take care of collecting and uploading the reports for you.

dcreager commented 6 years ago

I agree with @sleevi that blocking reports about private addresses wouldn't be enough (and in the IPv6 case, might be hard to even define), and that the real issue is the cross-origin NEL configuration.

That is, in the example above, Alice can install a NEL configuration for *.example.com without proving that she's the legitimate owner of the services running on those subdomains. We need to ensure that before generating any reports about alpha.example.com, we verify that its owner agrees to be covered by any NEL configuration that was received for *.example.com.

Complicating the issue is that we want that permission to extend to future requests to alpha.example.com, so that if requests to that subdomain start failing at some point in the future, we have valid instructions for how to report those failures.

dcreager commented 6 years ago

we verify that its owner agrees to be covered

A couple of options for this that I can think of:

sleevi commented 6 years ago
  • Verify that alpha.example.com is served using the same certificate as the example.com response that provided the NEL configuration. In practice, that will typically be a wildcard certificate — which makes sense, since include-subdomains is only needed in those same situations where you need a wildcard cert.

So, there's several edge cases that would have to be thought through: 1) Not every NEL detail could be reported, since you'd have to have got far enough in establishment (e.g. after DNS and TCP and TLS handshake to know the same cert. 1) What happens if the NEL policy was served with cert-A (*.example.com or two SANs alpha.example.com+example.com) and the subdomain is served with cert-B, which is compatible-but-different - which is not uncommon?

  • Add a new kind of NEL header that explicitly opts into a particular include-subdomains configuration. Something like:

1) Isn't that effective CORS?

For both 1 and 2, it seems like it's going to have a TOCTOU issue if we're allowed to report on connection establishment, and because of that, at risk for things like DNS rebinding?

annevk commented 6 years ago

I'm somewhat surprised we're exposing these network errors at all given that so far we have resisted exposing them in what is supposed to be the primitive, fetch. Should we not first figure out if we can reasonably expose them there?

dcreager commented 6 years ago

I'm somewhat surprised we're exposing these network errors at all given that so far we have resisted exposing them in what is supposed to be the primitive, fetch. Should we not first figure out if we can reasonably expose them there?

Exposing them in fetch would expose the errors to the requestor, while exposing them via NEL should only expose them to the requestee. (Or at least, that's the intent!)

dcreager commented 6 years ago

Isn't that effective CORS?

Ha, blerp blorp, yes it is! Not enough :coffee: today

dcreager commented 6 years ago

Not every NEL detail could be reported, since you'd have to have got far enough in establishment (e.g. after DNS and TCP and TLS handshake to know the same cert.

You'd need something like CORS's Access-Control-Max-Age, which would let you detect pre-TLS errors in future requests. But that still leaves open the TOCTOU/rebinding attack that you describe below.

What happens if the NEL policy was served with cert-A (*.example.com or two SANs alpha.example.com+example.com) and the subdomain is served with cert-B, which is compatible-but-different - which is not uncommon?

I would suggest that we'd err on the side of being conservative, and not generate reports for those kinds of requests. There's always the fallback of serving the full NEL policy from both domain + subdomain.

a TOCTOU issue if we're allowed to report on connection establishment, and because of that, at risk for things like DNS rebinding

This part isn't specific to the cross-origin example, I think. You could do a DNS rebinding attack like this just using a regular NEL policy on example.com.

annevk commented 6 years ago

Exposing them in fetch would expose the errors to the requestor, while exposing them via NEL should only expose them to the requestee.

How does this difference matter for same-origin requests?

dcreager commented 6 years ago

How does this difference matter for same-origin requests?

It wouldn't. Is the pushback on the fetch side just for cross-origin requests or for both?

(Also, is whatwg/fetch#526 where the discussion is happening on the fetch side? I agree with you that whatever we decide should be consistent. And ideally, if it's resolved by fetch defining specific error conditions that are returned via the API, NEL would use those directly instead of defining its own set of error codes.)

annevk commented 6 years ago

It's for both, as thus far we don't expose any errors (I realize the Performance WG has nevertheless exposed some errors (or at least timing, per https://w3c.github.io/resource-timing/), but none of that is grounded in formalized primitives). E.g., you don't get to know whether there is a strict CSP policy, you're offline, or the server is offline via fetch.

dcreager commented 6 years ago

as thus far we don't expose any errors

Note that NEL errors wouldn't be visible to JavaScript, regardless of whether the request is same-origin or cross-origin. The Reporting API has a JavaScript observer mechanism now, but the plan is that NEL reports will be explicitly marked as not observable. The NEL reports would only be visible to the owner of the origin being requested (or to the third-party collector that they designate in their NEL policy).

dcreager commented 6 years ago

It's also worth calling out an important monitoring use case that we'd lose if we require something CORS-like for subdomain reports: DNS misconfiguration. You own the DNS tree rooted at example.com, and you want to be informed when users try to follow links to subdomains that don't exist. Since there isn't any server running on the nonexistent subdomain, there's no way to provide a same-origin policy for the subdomain, and there's nothing that could respond to any CORS-like check for the subdomain.

That use case was a primary motivator behind include-subdomains policies. I'd even argue that if we decide we have to plug this security hole, the cleanest approach would be to remove include-subdomains policies completely. (That is, if we're going to require a successful request to the subdomain to verify consent, we might as well require that response to carry the full NEL policy that the subdomain consents to.) The only benefit that include-subdomains+CORS would have would be a small reduction in the number of cached policies that the user agent has to hang onto.

Another option (which I'm not sure yet if I like) is that an include-subdomains policy can only cover the error types that occur before connection establishment (i.e., only dns.*). There's not even a hypothetical server yet at that point, and so there's no need to get the server's permission to report on the error. That would keep the DNS misconfiguration use case in play. Once the user agent has a DNS response, and starts trying to open a connection, there's a (possibly hypothetical, nonexistent) server that's participating in the request, and which must actively consent. So any error that occurs from that point onward could only be covered by a non-include-subdomains policy.

Thoughts?

annevk commented 6 years ago

Do you also get these error reports if it's not the user, but if it's example.com itself that's poking around at its subdomains?

You could imagine an attacker finding an XSS at intranet.example and then using that as a basis for further exploration. Now this might not work if these reports only go over the network to a same-origin destination, but if like CSP they grow some kind of event and we forgot the reason why that's bad here...

We also tend to regret it or find unexpected attacks when things cross the origin boundary.

dcreager commented 6 years ago

I don't think XSS applies here since NEL reports are not observable, and are never visible in JavaScript. Only the server (or its designated collector) gets to see the report, regardless of if the request is cross-origin or same-origin, and regardless of it's a user-driven or script-driven request. Just like how only the server that would get to see an entry in access.log for the request, regardless of how the request originated.

annevk commented 6 years ago

Yeah, but are we sure we'll never change our mind on that and not forget about the invariant? (The remaining arguments still apply, too.)

dcreager commented 6 years ago

I need to add the text that says that NEL isn’t observable (observers are a recent addition to Reporting). I could include text explaining the rationale, would that work?

annevk commented 6 years ago

That would potentially address some of the concerns (such an invariant is hard to maintain over long periods of time); it doesn't address bypassing the same-origin policy.

(The same-origin policy itself is a good example of an invariant folks keep trying to poke holes through and we tend to regret that later whenever it's done.)

alvestrand commented 6 years ago

The problem here is that an attacker can request reports for names that are under his control. Since the attacker controls the names in the DNS, he can point them at whatever he wants - DDOS targets, internal addresses, insides of firewalls ... anything. Certificates guarantee that the attacker is authorized to use the name - but when the target of the attack is the address, this doesn't mitigate anything at all. Two possible resolutions I see off the bat:

dcreager commented 6 years ago

77 adds the text that explicitly states that NEL reports aren't observable, and explains why. So the requester will never get to see any NEL reports about the requests they make; only the server that receives the requests will.

That leaves open the question about what to do with include_subdomains policies. There are three options on the table:

  1. Do nothing: This isn't a new attack, since you can already use onload and onerror events to detect whether the request failed or not.

  2. CORS: Before generating a report because of an include_subdomains policy, the browser would have to make a CORS preflight to the subdomain to verify that it consents to being monitored by the superdomain's policy.

  3. Remove include_subdomains policies altogether: This removes this entire class of problem. The browser would only generate NEL reports based on policies received for that specific origin. No CORS needed, because the policy's origin and the request's origin would be guaranteed to always match.

  4. Remove include_subdomains policies for non-DNS errors: The browser would only use an include_subdomains policy to report on DNS errors (this would be safe because you've verified ownership of this part of the DNS tree). The browser could only use a same-origin policy to generate a report for any errors that occur once connection establishment has started.

Of these, I think we can definitely rule out option (2) — option (3) accomplishes the same result with a much simpler solution.

Straw poll?

flano-yuki commented 6 years ago

I have a question about verify ownership to domains resolved to private ip.

I think, even if remove include_subdomains, example.com can apply NEL Policy to 127-0-0-1.example.com (127.0.0.1) by using WebPackage (Signed HTTP Exchanges). (I know Chrome is implementing WebPackage )

Signed HTTP Exchanges enable example.com to send NEL Policy as it is sent from 127-0-0-1.example.com.

Steps