servo / servo

Servo, the embeddable, independent, memory-safe, modular, parallel web rendering engine
https://servo.org
Mozilla Public License 2.0
28.52k stars 3.04k forks source link

Implement referrer policy #10311

Open paulrouget opened 8 years ago

paulrouget commented 8 years ago

See https://w3c.github.io/webappsec-referrer-policy/

Necessary for https://github.com/servo/servo/issues/10309#issuecomment-203929226

jdm commented 8 years ago

Tests: tests/wpt/web-platform-tests/referrer-policy/

jdm commented 8 years ago

The goal of this specification is to allow webpages to control the behaviour of the Referer header (sic) when network requests are initiated.

The basic pieces here are the following:

rebstar6 commented 8 years ago

I'm going to work on this issue!

jdm commented 8 years ago

I've added some file pointers to my previous comment. Let us know if anything's unclear!

rebstar6 commented 8 years ago

I'm a little confused on scope - the spec refers to the "settings object" as having a referrer policy - is this specific to the Document? What do you mean above by "global environment"?

Also, I've worked with the document file a little, but not workerglobalscope - can you give a quick explanation of what that file is and how that and document interact (if they interact).

jdm commented 8 years ago

A "global environment" refers to the root object for executing JavaScript code. You can think of this as a per-tab global object. The "settings object" is a concept that we don't have an equivalent for in Servo yet, so we'll treat it as "data that resides in the global object". WorkerGlobalScope is the global object for a web worker; you should feel free to ignore workers at first and focus on how the specification applies to ordinary documents.

rebstar6 commented 8 years ago

On the policy propagation piece - I'm not totally clear what has access to what.

Looking at the files, what seems to be happening is that when you click on a link that should load a new page, the document_loader 'prepare_async_load' method gets called, and that, in theory, is where things can be passed between the document and the http_loader methods (most notably, load). If I have a document with some referrer policy, the load method would need both the current document's referrer policy and the current document's URL (and maybe some other stuff) in order to properly set the referer header. Right now, load doesn't have any access to the current doc(?). Is this right?

jdm commented 8 years ago

That is correct; the desired referrer policy would need to be passed as part of the LoadData. Passing it to the PendingAsyncLoad constructor via the methods in Document that invoke the DocumentLoader code would make sense to me.

rebstar6 commented 8 years ago

But beyond that, if I just have the referrer policy, I still cant set the header because I also need the previous document's URL to populate it - the loaddata currently just has the url to be loaded, right?

jdm commented 8 years ago

Yep. We should probably add that to LoadData as well!

rebstar6 commented 8 years ago

Alright, so just wanna confirm I'm on the right track here: https://github.com/rebstar6/servo/commit/8c91f8abad39debdf77228669c94969865739fc0

I added Option for referrer policy and referrer URL to both PendingAsyncLoad and LoadData. From document, they get passed 1st to PendingAsyncLoad which passes them forward to LoadData (where they will be picked up by the http_loader when setting headers). With the exception of the prepare_async_load/load_async methods, these are None.

Just to confirm - document.rs has both prepare_async_load and load_async, each of which call their respective methods in document_loader...but the document_loader's load_async calls prepare_async_load. Is the prepare getting called twice? What is the point of prepare_async_load within the document class?

jdm commented 8 years ago

Yes, that looks like it's on the right track. The LoadData constructors in XMLHttpRequest and ScriptThread::start_page_load should have non-None values, but those can be dealt with later as long as there's a TODO marking them.

As for the second question, prepare doesn't get called twice for the same value. The difference between prepare_async_load and async_load is that the former returns a structure that encapsulates a network request that can be started at some point in the future, while the second initiates a network request immediately.

rebstar6 commented 8 years ago

Alright cool. Also, where does fetch come into play (and what is it, exactly?)?

The next thing I was going to try was to set the referer header within modify_request_headers based on the policy and urls. However, unsure if this should actually go into the fetch code instead (or both if theyre separate things)?

jdm commented 8 years ago

The code in components/net/fetch/ implements the Fetch specification and is designed to replace the ad-hoc network implementation that currently exists in stuff like http_loader.rs. It's only used in unit tests so far; we're going to switch over to using it after it's more complete. So yes, doing it in both would be most correct (and you'll need to add similar referrer data to Request as you did LoadData, since they don't overlap).

rebstar6 commented 8 years ago

@SimonSapin - I mentioned this in IRC but I don't think you were on then. At some point for this issue, I will need to strip the username/password, fragment, and in some cases path and query from a URL (see https://w3c.github.io/webappsec-referrer-policy/#strip-url). I was told that you wrote (and are working on) rust-url. Do you have any suggestions about the best way to approach this? Are there changes to the crate that might make some of this easier? Any help appreciated!

rebstar6 commented 8 years ago

@jdm - I've noticed that there are a bunch of requests on a page load where http_loader is capturing the referrer_policy/url. However, some are excluded (and tests fail as a result). For example, I added prints in http_loader for referrer_url and the url being loaded. if I do ./mach run https://www.wikipedia.org/ I get some assets receiving the referrer, and some not. Right now, I have no policy logic in place - every request should have the referrer added if there is one:

URL "https://www.wikipedia.org/" REFERRER UNKNOWN URL "https://www.wikipedia.org/portal/wikipedia.org/assets/js/abtesting-c8078be6c2.js" REFERRER: "https://www.wikipedia.org/" URL "https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia_wordmark.png" REFERRER UNKNOWN URL "https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia-logo-v2.png" REFERRER UNKNOWN URL "https://www.wikipedia.org/portal/wikipedia.org/assets/img/sprite-bookshelf_icons.png?18d8b7f58860758a154b3f1b0d329784d0f4235a" REFERRER UNKNOWN URL "https://www.wikipedia.org/portal/wikipedia.org/assets/js/index-1d96a05608.js" REFERRER: "https://www.wikipedia.org/" (... and some more, mixed with/without referrer)

Then note, if I click the 'English' link, the next URL to load has no referrer, which I think may be the cause of failing tests for me:

URL "https://en.wikipedia.org/" REFERRER UNKNOWN URL "https://en.wikipedia.org/wiki/Main_Page" REFERRER UNKNOWN URL "https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=ext.gadget.DRN-wizard%2CReferenceTooltips%2Ccharinsert%2Cfeatured-articles-links%2CrefToolbar%2Cswitcher%2Cteahouse%7Cext.tmh.thumbnail.styles%7Cext.uls.nojs%7Cext.visualEditor.desktopArticleTarget.noscript%7Cext.wikimediaBadges%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.raggett%2CsectionAnchor%7Cmediawiki.skinning.interface%7Cskins.vector.styles&only=styles&skin=vector" REFERRER: "https://en.wikipedia.org/wiki/Main_Page" URL "https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=site&only=styles&skin=vector" REFERRER: "https://en.wikipedia.org/wiki/Main_Page" (and more again, assets within the english wiki page)

The point of this is, the referrer_url is definitely not making the transition into http_loader in all cases after this logic is in place: https://github.com/rebstar6/servo/commit/d8c06b40faa576726b405d9c2036f08b13238df6 . Any idea what's failing here? Is this maybe related to the ScriptThread::start_page_load you mentioned above?

jdm commented 8 years ago

The image loader code isn't providing a referrer because the current design of the image cache makes it impossible to do so correctly. We can ignore any failures to do with images, accordingly. The failure for the top-level Document is start_page_load, as you correctly intuited.

SimonSapin commented 8 years ago

@rebstar6 Yes I’m the main author and maintainer of rust-url. You can find the API documentation for the Url struct in the version of rust-url used by Servo at https://servo.github.io/rust-url/url/struct.Url.html

As you can see there, each component is stored separately. You can reset them with code like:

if let Some(relative) = url.relative_scheme_data_mut() {
    // String::clear sets the length of a string to zero, effectively removing its content.
    relative.username.clear();
    relative.password = None;
}
url.fragment = None;

I also have PR #9840 that updates Servo to a version of rust-url where the Url struct is completely different. But a number of thing still need to happen before it can be merged, so maybe your PR will be merged first and you won’t have to worry about it. If #9840 is merged first you’ll have to adapt your PR to the new API. In that API the Url struct will then have a single String internally but as a private field. To manipulate it, go through the setter methods:

url.set_fragment(None);
let _ = url.set_password(None);
let _ = url.set_username("");

Since strings are contiguous in memory, touching something in the middle that changes the length mean the rest of the string afterwards needs to be moved. For that reason it’s slightly more efficient to remove components at the end of the URL first.

let _ = is used to ignore the returned Result<(), ()>. This result can signal an error since some URLs like data:text/html,<p>Example can’t have a username or password, but since you were trying to remove them anyway that’s OK. (This is by the way when the previous code would not go into the if let block.)

Feel free to ask more questions here or on IRC. I’m in the central Europe time zone, but if you mention my nickname on IRC while I’m away I’ll still see it later and can answer then (or maybe someone else can answer), so leave your IRC client online if you can.

rebstar6 commented 8 years ago

@jdm a few more questions...

  1. Everything in tests/wpt/web-platform-tests/referrer-policy/ seems to be testing img, iframe, script tags, xhr-requests (not, or fetch. I'm not seeing anything that tests the basic case (I'm on a page and click a link). Is this group of tests just geared towards these cases for some reason? Is there anything elsewhere I can use (she asks hopefully)?
  2. I followed the script methods backwards as best I could, and settled on load_url in window.rs as the place that the referrer info can be passed along (load_url function - see https://github.com/rebstar6/servo/commit/f48fff9245f24c338bcfa772d7fb0515591b9c41 if curious). This seems to work for my wikipedia case above. Is it too broad though (read: does window.load_url() get called in cases where we may not want that referrer info sent, like if you type a url into the browser? With my change, it will always send the referrer info for whatever that window's current document is)
jdm commented 8 years ago

1) Unfortunately, since the test harness must be loaded in the top level document, any code that tests navigation must rely on an iframe so the tested document doesn't replace the test harness. That being said, it may be possible to test some cases by having the framed document call a method (eg. parent.navigation_complete(document.referrer)) instead of using postMessage, which we don't support. 2) That's an interesting question. I'll have to get back to you on that.

rebstar6 commented 8 years ago

Gotcha. I'm definitely to the point where having some tests to code from would be helpful. If you could give me a little more guidance on how to edit what we have (or create new?), that would be great.

I think I have the Referer header setting piece mostly in place (read: http-loader will set the referer header given the some policy and url provided by the calling document). This assumes some policy given though, I'm not yet pulling the policy from anywhere. I could, at a minimum, write some unit tests for this to verify it, but those will be moot if we can get the full html tests working.

Also, thinking in terms of eventual review, is there any way you want me split this up for PRs? I don't want it to get too big and stuck in review. (code at https://github.com/rebstar6/servo/tree/referrerPolicy)

jdm commented 8 years ago

Whoops, didn't get back to you earlier. Window.load_URL looks like a fine choke point.

jdm commented 8 years ago

In terms of tests, we can't write unit tests for network code that interacts with code in script, so the best we can do there is write tests for the network code given LoadData with certain characteristics. Those would be tests in unit/net/http_loader.rs that verify that the expected Referer header is present in the request that would be sent to the server. For those iframe WPT tests, I'll take a look and see if there's a way to copy and rewrite some of them to not timeout.

jdm commented 8 years ago

As for reviewing, if you'd like to split it up like:

That should make each part straightforward to review.

rebstar6 commented 8 years ago

So I guess the question is, is there any point in writing some unit tests and PR-ing what I have now so I can get an early review on it?

That would include:

The unit tests, as you suggested, would actually validate what I have so far - given a policy, does http-loader set the right header?. The iframe tests that we want to edit become necessary only in the next piece, which is setting the policy as delivered by: https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-delivery

jdm commented 8 years ago

Sounds like a PR including unit tests would make sense, given the current policy settings :)

jdm commented 8 years ago

@rebstar6 Just to be sure, are you continuing to work on this, or is your project period complete?

rebstar6 commented 8 years ago

Technically the project is over, but I'm gonna still work on it - the next week is exam time, so likely no progress, but after May 4 I plan to jump back into it. Let me know if that's ok!

jdm commented 8 years ago

Totally fine 💃

rebstar6 commented 8 years ago

Alright I'm back! The next thing to do here is referrer policy delivery - https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-delivery. I'll probably start with one, so each will be its own PR (maybe meta tag? seems most important for https://github.com/servo/servo/issues/10309#issuecomment-203929226). At this point, it becomes important that I have some way to test this. I assume the tests still can't be run due to other non-implemented items. Is there a good way to validate what I'm doing here?

jdm commented 8 years ago

I suspect it's only the iframe tests that are missing fundamental pieces right now, so I think it should be possible to rely on the attr-referrer/ and meta-referrer/ tests for the referrerpolicy, noreferrer and meta delivery mechanisms respectively (excluding iframes, obviously). We don't have any support for CSP yet, so you can ignore those.

jdm commented 8 years ago

Oh, foo, looks like the tests for <link> and <a> also rely on postMessage, as does <script>. That means that <img>, XMLHttpRequest, and Worker tests should work (although I'm having trouble finding any instances of the Worker ones so far). Anyways, if you focus on supporting the first two, we can figure out what to do about postMessage.

rebstar6 commented 8 years ago

img is still giving me issues. try: ./mach test-wpt tests/wpt/web-platform-tests/referrer-policy/no-referrer/meta-referrer/same-origin/http-https/img-tag/ I'm getting timeouts with InvalidStateError. This is no-referrer, which should in theory be 'working' (as in, nothing is sent).

jdm commented 8 years ago

Ok, I figured out the img issue. Any tests that involve https will fail in Servo right now, so ignore any of the directories with http-https or https-http. The http-http ones should work, and appear to do so in my local testing. Additionally, I investigated the postMessage thing over the weekend and came up with https://github.com/jdm/servo/commit/afc7f7b18942bd6921135848291a11a11b2028a7 . This commit allows the iframe tests to pass, but is highly unlikely to be allowed to merge as-written. It should enable you to at least test the code you're writing while we come up with a more maintainable replacement, though!

rebstar6 commented 8 years ago

Cool, started my branch off of that one. Should be helpful. 2 more testing things - I started trying to include/exclude the appropriate tests. The include.ini gets nuts when I use that approach (see gist - right now this is just 2 policies, img only. Adding the other 3 policies will be even more). Is there a better way to do this? (see https://gist.github.com/rebstar6/4026c0574e976179c8046a9042556e40)

Also, any idea why these tests only seem to have http-http and http-https? Specifically, we almost definitely want a test for https to http for the 'no referrer when downgrade' policy, since the point of that is to not send the referrer when you 'downgrade' (ie, go https to http).

I suppose if https wont work now anyway, that doesnt matter in the short term, but calling it out for later.

jdm commented 8 years ago

I recommend using skip: false on the toplevel directory, then adding __dir__.ini files to disable subdirectories as necessary. That's a good question about the tests; I'll raise it upstream.

jdm commented 8 years ago

Filed https://github.com/w3c/web-platform-tests/issues/2968.

rebstar6 commented 8 years ago

Thanks! And sure, went the directory disabling route. Because of all the file nesting, were still talking a bunch of files, but oh well - https://github.com/rebstar6/servo/commit/21104c433a77d33a8340f5607c48090b4c73dc40

Now I guess I should actually get things to work.

rebstar6 commented 8 years ago

Hrm, started, but I kinda hate this direction, so polling for suggestions. I'm just going to go ahead and point out all the issues with my approach so far (see https://github.com/rebstar6/servo/commit/e82efa0fc407220811d8770715d8297ad9fad86d)

This doesn't compile, and even if it did, things are super inefficient. I'm trying to implement this logic., ideally using methods we have already. So I start off pulling all the meta referrer elements, though that's silly (and maybe slow?) because we really only want the ones that are children of the head element. Then from there, I want to iterate though them, but I keep getting stuck in rooted/js elements and im not sure how to get out of that. As it stands, it gives the error: "error: Expression of type std::vec::Vecdom::bindings::js::JS must be rooted, #[deny(unrooted_must_root)] on by default" ...though if not mut I can't call pop(). Sigh.

Also, ideally the policy is set the 1st time get_referrer_policy is called and never again (is that even possible?). If not, it's probably best on document creation so we aren't re-running the method on each link out. Until the document is created though, I dont have use of the document methods I'm using here, which is why it lives where it lives for now.

Thoughts? Suggestions?

jdm commented 8 years ago

I'd probably recommend implementing the bind_to_tree method for HTMLMetaElement and making it call a method on its owning document with the associated referrer policy that it contains, if applicable. This turns it into a push notification at creation, rather than making the document poll. I'm assuming that we don't need to care about updates to the attribute later, but if we did we could use after_set_attr for that in a similar fashion.

rebstar6 commented 8 years ago

Striking out today: see https://pastebin.mozilla.org/8870552 Basically, this is what I'm printing when running tests/wpt/web-platform-tests/referrer-policy/origin-only/meta-referrer/same-origin/http-http/iframe-tag/generic.no-redirect.http.html

Looks like there are a few requests happening in this test. Most of them do have the referer getting set. But for the one we care about (lines 27-31), http_loader::load seems to be getting called with None/None for referrer url/policy. I have to assume there’s another load_data somewhere that's doing this. This is an iframe test - any ideas?

rebstar6 commented 8 years ago

Actually...does an iframe call not go through window::load_url? Would make sense. That may be the issue as that was the pass-through point

jdm commented 8 years ago

Oh yeah, http://mxr.mozilla.org/servo/source/components/compositing/constellation.rs#1006 inside handle_script_loaded_url_in_iframe_msg is something I explicitly mentioned in the original PR :)

jdm commented 8 years ago

Sorry, I'm confusing myself - http://mxr.mozilla.org/servo/source/components/script/dom/htmliframeelement.rs#151 is the proper one.

rebstar6 commented 8 years ago

Yep, there's that TODO. Alright, welp.

jdm commented 8 years ago

Since that now lives in code that has direct access to a document, it should be straightforward to fix, at least!

rebstar6 commented 8 years ago

Yeah much better, got myself a referer header!

rebstar6 commented 8 years ago

One more test issue, this one should be easier:

According to https://w3c.github.io/webappsec-referrer-policy/#determine-policy-for-token, the origin when cross origin string is "origin-when-cross-origin". The tests use "origin-when-crossorigin", so this policy fails.

Can I change the tests, or is this another issue for w3c?

jdm commented 8 years ago

According to the mailing list, this was a typo in the old specification that shipped in some versions of Chrome. Let's modify the tests locally and file an issue upsteam about it as well. Would you like to do that?

jdm commented 8 years ago

Actually it looks like the Firefox developers are making the same modifications, so I guess we can race them and not bother with the upstream issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1270050