Closed dcreager closed 6 years ago
RFC 1918 may be one scope, but more broadly, this is related to bypassing the origin-based model (via includeSubDomains). If this was scoped to an origin, this would not be an issue either.
If we're going to expand past an origin, is there an opportunity for a CORS-like consent model? Are there other places where the lack of following the Same Origin Policy going to bite us?
Is this different from how you can use the load and error events on image/iframe/script for the same purpose?
(Edit: to be clear, origin-scoping for new features is a Good Thing, even if old features are improperly secure. I'm just a bit curious about the actual attack, as the blink-dev post seems very similar to what's already possible. If the answer is, "they're the same", then we should probably still restrict!)
I suppose NEL reports have more information than load and error events.
tcp.refused
, tcp.closed
, tcp.reset
and tcp.address_unreachable
have different mean.
Transcribing the example from the blink-dev thread:
Alice owns example.com
, and configures her DNS servers to return private IP addresses for one or more subdomains (e.g., 127-0-0-1.example.com
→ 127.0.0.1).
Alice hosts a web page at https://example.com/
(using a perfectly valid signed certificate) that contains img
links to this subdomain:
<img src="https://127-0-0-1.example.com:2001" />
<img src="https://127-0-0-1.example.com:2002" />
<img src="https://127-0-0-1.example.com:2003" />
This page also includes Reporting/NEL response headers with an include-subdomains
configuration:
NEL: {"report-to": "nel", "include-subdomains": true,
"success-fraction": 1.0, "failure-fraction": 1.0}
Alice convinces Bob to access this web page. Bob's user agent will follow the img
links, and its NEL stack will report back to Alice's NEL collector about whether or not there's anything listening on those 127.0.0.1 ports.
End result: Alice has been able to perform a port scan of Bob's internal network by convincing him to access a web page.
Is this different from how you can use the load and error events on image/iframe/script for the same purpose?
As @flano-yuki points out, if you don't care about the differences between e.g. "connection refused" and "connection reset", then it's equivalent to what you can do with load/error events — with the caveat that the NEL stack in the browser would take care of collecting and uploading the reports for you.
I agree with @sleevi that blocking reports about private addresses wouldn't be enough (and in the IPv6 case, might be hard to even define), and that the real issue is the cross-origin NEL configuration.
That is, in the example above, Alice can install a NEL configuration for *.example.com
without proving that she's the legitimate owner of the services running on those subdomains. We need to ensure that before generating any reports about alpha.example.com
, we verify that its owner agrees to be covered by any NEL configuration that was received for *.example.com
.
Complicating the issue is that we want that permission to extend to future requests to alpha.example.com
, so that if requests to that subdomain start failing at some point in the future, we have valid instructions for how to report those failures.
we verify that its owner agrees to be covered
A couple of options for this that I can think of:
Verify that alpha.example.com
is served using the same certificate as the example.com
response that provided the NEL configuration. In practice, that will typically be a wildcard certificate — which makes sense, since include-subdomains
is only needed in those same situations where you need a wildcard cert.
Add a new kind of NEL header that explicitly opts into a particular include-subdomains
configuration. Something like:
NEL: {"use-superdomain-config": "example.com"}
Remove include-subdomains
completely. alpha.example.com
would have to serve its own copy of the NEL configuration to have any reports generated about it.
- Verify that alpha.example.com is served using the same certificate as the example.com response that provided the NEL configuration. In practice, that will typically be a wildcard certificate — which makes sense, since include-subdomains is only needed in those same situations where you need a wildcard cert.
So, there's several edge cases that would have to be thought through:
1) Not every NEL detail could be reported, since you'd have to have got far enough in establishment (e.g. after DNS and TCP and TLS handshake to know the same cert.
1) What happens if the NEL policy was served with cert-A (*.example.com
or two SANs alpha.example.com
+example.com
) and the subdomain is served with cert-B, which is compatible-but-different - which is not uncommon?
- Add a new kind of NEL header that explicitly opts into a particular include-subdomains configuration. Something like:
1) Isn't that effective CORS?
For both 1 and 2, it seems like it's going to have a TOCTOU issue if we're allowed to report on connection establishment, and because of that, at risk for things like DNS rebinding?
I'm somewhat surprised we're exposing these network errors at all given that so far we have resisted exposing them in what is supposed to be the primitive, fetch. Should we not first figure out if we can reasonably expose them there?
I'm somewhat surprised we're exposing these network errors at all given that so far we have resisted exposing them in what is supposed to be the primitive, fetch. Should we not first figure out if we can reasonably expose them there?
Exposing them in fetch would expose the errors to the requestor, while exposing them via NEL should only expose them to the requestee. (Or at least, that's the intent!)
Isn't that effective CORS?
Ha, blerp blorp, yes it is! Not enough :coffee: today
Not every NEL detail could be reported, since you'd have to have got far enough in establishment (e.g. after DNS and TCP and TLS handshake to know the same cert.
You'd need something like CORS's Access-Control-Max-Age
, which would let you detect pre-TLS errors in future requests. But that still leaves open the TOCTOU/rebinding attack that you describe below.
What happens if the NEL policy was served with cert-A (
*.example.com
or two SANsalpha.example.com
+example.com
) and the subdomain is served with cert-B, which is compatible-but-different - which is not uncommon?
I would suggest that we'd err on the side of being conservative, and not generate reports for those kinds of requests. There's always the fallback of serving the full NEL policy from both domain + subdomain.
a TOCTOU issue if we're allowed to report on connection establishment, and because of that, at risk for things like DNS rebinding
This part isn't specific to the cross-origin example, I think. You could do a DNS rebinding attack like this just using a regular NEL policy on example.com
.
Exposing them in fetch would expose the errors to the requestor, while exposing them via NEL should only expose them to the requestee.
How does this difference matter for same-origin requests?
How does this difference matter for same-origin requests?
It wouldn't. Is the pushback on the fetch side just for cross-origin requests or for both?
(Also, is whatwg/fetch#526 where the discussion is happening on the fetch side? I agree with you that whatever we decide should be consistent. And ideally, if it's resolved by fetch defining specific error conditions that are returned via the API, NEL would use those directly instead of defining its own set of error codes.)
It's for both, as thus far we don't expose any errors (I realize the Performance WG has nevertheless exposed some errors (or at least timing, per https://w3c.github.io/resource-timing/), but none of that is grounded in formalized primitives). E.g., you don't get to know whether there is a strict CSP policy, you're offline, or the server is offline via fetch.
as thus far we don't expose any errors
Note that NEL errors wouldn't be visible to JavaScript, regardless of whether the request is same-origin or cross-origin. The Reporting API has a JavaScript observer mechanism now, but the plan is that NEL reports will be explicitly marked as not observable. The NEL reports would only be visible to the owner of the origin being requested (or to the third-party collector that they designate in their NEL policy).
It's also worth calling out an important monitoring use case that we'd lose if we require something CORS-like for subdomain reports: DNS misconfiguration. You own the DNS tree rooted at example.com
, and you want to be informed when users try to follow links to subdomains that don't exist. Since there isn't any server running on the nonexistent subdomain, there's no way to provide a same-origin policy for the subdomain, and there's nothing that could respond to any CORS-like check for the subdomain.
That use case was a primary motivator behind include-subdomains
policies. I'd even argue that if we decide we have to plug this security hole, the cleanest approach would be to remove include-subdomains
policies completely. (That is, if we're going to require a successful request to the subdomain to verify consent, we might as well require that response to carry the full NEL policy that the subdomain consents to.) The only benefit that include-subdomains
+CORS would have would be a small reduction in the number of cached policies that the user agent has to hang onto.
Another option (which I'm not sure yet if I like) is that an include-subdomains
policy can only cover the error types that occur before connection establishment (i.e., only dns.*
). There's not even a hypothetical server yet at that point, and so there's no need to get the server's permission to report on the error. That would keep the DNS misconfiguration use case in play. Once the user agent has a DNS response, and starts trying to open a connection, there's a (possibly hypothetical, nonexistent) server that's participating in the request, and which must actively consent. So any error that occurs from that point onward could only be covered by a non-include-subdomains
policy.
Thoughts?
Do you also get these error reports if it's not the user, but if it's example.com
itself that's poking around at its subdomains?
You could imagine an attacker finding an XSS at intranet.example
and then using that as a basis for further exploration. Now this might not work if these reports only go over the network to a same-origin destination, but if like CSP they grow some kind of event and we forgot the reason why that's bad here...
We also tend to regret it or find unexpected attacks when things cross the origin boundary.
I don't think XSS applies here since NEL reports are not observable, and are never visible in JavaScript. Only the server (or its designated collector) gets to see the report, regardless of if the request is cross-origin or same-origin, and regardless of it's a user-driven or script-driven request. Just like how only the server that would get to see an entry in access.log
for the request, regardless of how the request originated.
Yeah, but are we sure we'll never change our mind on that and not forget about the invariant? (The remaining arguments still apply, too.)
I need to add the text that says that NEL isn’t observable (observers are a recent addition to Reporting). I could include text explaining the rationale, would that work?
That would potentially address some of the concerns (such an invariant is hard to maintain over long periods of time); it doesn't address bypassing the same-origin policy.
(The same-origin policy itself is a good example of an invariant folks keep trying to poke holes through and we tend to regret that later whenever it's done.)
The problem here is that an attacker can request reports for names that are under his control. Since the attacker controls the names in the DNS, he can point them at whatever he wants - DDOS targets, internal addresses, insides of firewalls ... anything. Certificates guarantee that the attacker is authorized to use the name - but when the target of the attack is the address, this doesn't mitigate anything at all. Two possible resolutions I see off the bat:
That leaves open the question about what to do with include_subdomains
policies. There are three options on the table:
Do nothing: This isn't a new attack, since you can already use onload
and onerror
events to detect whether the request failed or not.
CORS: Before generating a report because of an include_subdomains
policy, the browser would have to make a CORS preflight to the subdomain to verify that it consents to being monitored by the superdomain's policy.
Remove include_subdomains
policies altogether: This removes this entire class of problem. The browser would only generate NEL reports based on policies received for that specific origin. No CORS needed, because the policy's origin and the request's origin would be guaranteed to always match.
Remove include_subdomains
policies for non-DNS errors: The browser would only use an include_subdomains
policy to report on DNS errors (this would be safe because you've verified ownership of this part of the DNS tree). The browser could only use a same-origin policy to generate a report for any errors that occur once connection establishment has started.
Of these, I think we can definitely rule out option (2) — option (3) accomplishes the same result with a much simpler solution.
Straw poll?
I have a question about verify ownership to domains resolved to private ip.
I think, even if remove include_subdomains
, example.com can apply NEL Policy to 127-0-0-1.example.com (127.0.0.1) by using WebPackage (Signed HTTP Exchanges).
(I know Chrome is implementing WebPackage )
Signed HTTP Exchanges enable example.com to send NEL Policy as it is sent from 127-0-0-1.example.com.
Steps
I'm not sure Whether this method is effective or not, if so, we should resolve this issue (or this is webpackage's issue)
@jyasskin ^
I'm not sure that such a scenario is significantly different than one done by DNS-rebinding like attack. In the rebinding model, an attacker would have 127-0-0-1.example.com
resolve to a server of the attacker's control (e.g. same IP as example.com
) and set the policy, having it noted in the client (due to max_age
).
The attacker would then change the resolution to point 127-0-0-1.example.com
to 127.0.0.1
and examining the results. The previously noted policy will apply, but report information to the attacker.
It seems NEL fundamentally enables this attack through its policy storage, allowing for a distinction between time of check and time of use (TOCTOU). To some extent, this makes the option (2) in https://github.com/WICG/network-error-logging/issues/74#issuecomment-399087271 the most appealing, in that it forces a 'fresh' check. Of course, that would have to be extended to all policies (for all domains) whose IP resolves to something different than when the policy was noted.
That TOCTOU behavior is intentional, to help detect DNS hijacking attacks. E.g., I own example.com
, which resolves to a server I own (1.2.3.4), and I want to be notified if any of my users start getting IP addresses other than the ones I expect (e.g. 5.6.7.8). I think a CORS check for same-origin requests would prevent the browser from reporting those errors, right?
Yes, it would prevent that use case of site operators from being addressed. However, it would better preserve user privacy.
What about something similar to option (4)? We treat "the IP address that we're about to use for the request is different from the one where we received the policy" as a new kind of DNS "error". (dns.changed_address
or something like that) And when this happens, DNS errors are the only things that NEL could report back on. That would still let us detect DNS hijacking, but in a rebinding attack, the only thing the attacker could learn is that the rebinding took place.
(My main concern with option (4) is whether it would make the spec too convoluted. I'm going to take a stab at a PR with some draft text to see whether that's the case)
I agree that (4) is pretty convoluted, and between enabling the rebinding attack (that is, introducing yet-another-storage-layer with privacy properties) and the include_subdomains
aspect, it gets increasingly difficult to reason about, but I'd be willing to hold off judgement until seeing the spec text. :)
That naturally implies that Web Packaging would not be able to set or note these policies at all (they don't have a policy to note), which itself is... difficult to reason about (and to implement).
Will there be similar concerns with how Web Packaging interacts with CSP? NEL prohibits a policy being set in a meta
tag, for instance, citing how CSP does the same.
@dcreager Yes, there's a set of concerns with Signed Exchanges that are still being worked out with respect to mutability/persistence, particularly around domain-related policies.
Alrighty, #83 is a stab at this. The diff is actually not as bad as I thought:
We have to keep track of which phase each kind of network error can occur in, but because we were already grouping errors (dns.*
, tcp.*
, etc), that was easy enough.
If the IP address of a policy and a request don't match, the only thing NEL will report is that the resolved IP address changed. For a DNS rebinding attack, that means that NEL will tell the attacker that the client saw the newly rebound IP address, but won't tell them anything about whether connections to that new address succeeded or failed. For legit uses (like DNS-based load balancing), the new address will still point at the "real" server for the origin, and so they'll get a new policy with the right (new) IP address; that means we'll generate reports about any errors that occur during any phase.
include_subdomains
policies can only be used to report DNS resolution errors. For the original attack from the blink-dev thread, NEL won't report anything at all about requests to 127-0-0-1.example.com
.
I think the WebPackage examples are no longer problems — either the IP address that delivered the WebPackage would be different from the origin's actual IP address (which will produce the same result as the DNS rebinding attack); or the IP addresses are the same, in which case you're using WebPackage as a legit optimization, and your NEL reports will show up exactly the same as if you didn't use WebPackage.
Thoughts?
I sadly don't think I'll have a chance to do an indepth review of this for the next two-months (yeah, I know...), but some quick feedback:
For legit uses (like DNS-based load balancing), the new address will still point at the "real" server for the origin, and so they'll get a new policy with the right (new) IP address; that means we'll generate reports about any errors that occur during any phase.
I'm confused by this. #83 seems like it treats the IP as a singular thing (received IP address
, server_ip
, resolved IP address
) - except you can of course have many returned by DNS. I'm not sure how that is compatible with your description here, though - DNS-based load balancing that returns distinct IPs would naturally mean that the user's server_ip
is rotating among its received IP address
es, right?
That is, in a real world scenario, where a user in Geo 1 resolves example.com
to [192.168.0.1, 192.168.0.2, 192.168.0.3]
and when in Geo 2 resolves to [192.168.0.2, 192.168.0.3, 192.168.0.4]
, what are the expected behaviours when a user connects to 192.168.0.1
and receives the policy? And for 192.168.0.2
?
in which case you're using WebPackage as a legit optimization, and your NEL reports will show up exactly the same as if you didn't use WebPackage.
I suspect that's misunderstanding Signed Exchanges / Bundled Exchanges (or perhaps referring to something different?). In general, delivering a signed exchange to the user from the origin server would be a performance-negative (since you already negotiated the TLS handshake and verified the certificate, why do a second layer of verification and framing)?
except you can of course have many returned by DNS
Even if DNS returns several addresses, the client is only going to use one of them (ignoring happy eyeballs) for a request. And I don't think we can assume that the other addresses in the DNS response point at the same server, right? So if the only way to confirm ownership of the server is to have a successful HTTPS connection that delivers a NEL policy, then we can't assume that the ownership should also apply to the other addresses in the response. Which means we have to track the individual IP addresses that were used for each request, and not the sets of IP addresses that might have come back from DNS at the same time.
[192.168.0.1, 192.168.0.2, 192.168.0.3]
from DNSok
NEL reportok
reporttcp.timed_out
report, but since the address of the last policy (192.168.0.2) and this request (192.168.0.3) don't match, the only thing we can report is dns.changed_address
. Not ideal, but it's the best we can do right now.I suspect that's misunderstanding Signed Exchanges / Bundled Exchanges
Oh that's almost certainly true. I was just trying to say that the attack in this comment should be covered by this proposal too.
Thanks! I was definitely misunderstanding your comment about it addressing the DNS load balancing case - I thought you were saying the intent was to allow reports to be sent fully for 192.168.0.3
if 192.168.0.2
was in the report (which didn't match #83, hence trying to unpack the goal and implementation a bit more)
From a storage perspective, does this mean a user potentially will have as many NEL policies as there are IP addresses for that domain, and potentially have them diverge over time? Apologies if we should be chatting on #83 about that, but still trying to unpack the design space, as mentioned in https://github.com/WICG/network-error-logging/issues/74#issuecomment-399984666
From a storage perspective, does this mean a user potentially will have as many NEL policies as there are IP addresses for that domain, and potentially have them diverge over time?
As I've written it, no — there would still be at most one policy for an origin, and the user agent would keep track of the most recent IP address that it was received from. So as a slight variation of the above example:
[192.168.0.1, 192.168.0.2, 192.168.0.3]
from DNSok
NEL reportok
reportdns.changed_address
, since the policy was most recently received on 192.168.0.2.That's also not ideal, but I wanted to keep things simple and conservative to start with. We can always examine ways to expand when we can send back full reports as a future edit to the spec. (Ideally, after having shipped this simpler version and collected data about how often we're getting downgraded dns.changed_address
reports when we think we shouldn't have to.)
Assuming there's no other feedback, I'm going to add some of the examples we've discussed here to the text in #83 and send it out for review.
I think I agree that, if #83 is specified in terms of a Fetch response's connection's server IP address (as I just suggested there), then the Web Packaging specs will set the IP address such that cross-origin Signed Exchanges won't match the IP address(es) for direct connections, which will avoid the rebinding attacks here.
That might be a little unfortunate, since servers may want to know about errors in their web packages, but it does seem reasonable to me.
since servers may want to know about errors in their web packages, but it does seem reasonable to me.
Yeah, that's another example where the current proposal is more conservative than it needs to be. I think we can figure out how to make the "received IP address" check cover this case, too; I just wanted to consider that a follow-on, and not a blocker to getting this fixed.
include_subdomains
attack by only allowing include_subdomains
policies to be used to report on DNS errors.Thanks everyone for the thorough discussion and review! I'm going to close this issue; if there are any follow-on concerns, we can open new issues for them.
Thank you for your work!
First reported on blink-dev. We should add language to the spec to prevent NEL reporters from sending reports about any connections to a private address. Without that, someone could use an
include-subdomains
configuration to have clients probe their internal RFC1918-addressed networks, and report those results back to an external collector.