Open paulrouget opened 8 years ago
Tests: tests/wpt/web-platform-tests/referrer-policy/
The goal of this specification is to allow webpages to control the behaviour of the Referer
header (sic) when network requests are initiated.
The basic pieces here are the following:
net_traits/lib.rs
Document
and WorkerGlobalScope
, in document.rs
and workerglobalscope.rs
respectively)http_loader.rs
and modify the Referer
header accordingly (via document_loader.rs
, fetch the policy from the global environment)htmlscriptelement.rs
, htmliframeelement.rs
, etc.)components/net/fetch/methods.rs
) and write unit tests (tests/unit/net/fetch.rs
)Referrer-Policy
HTTP header in the initial network response for a worker/document (in http_loader.rs
, add a new field to the Metadata
type and extract that when creating the new Document
in script_thread.rs
)I'm going to work on this issue!
I've added some file pointers to my previous comment. Let us know if anything's unclear!
I'm a little confused on scope - the spec refers to the "settings object" as having a referrer policy - is this specific to the Document? What do you mean above by "global environment"?
Also, I've worked with the document file a little, but not workerglobalscope - can you give a quick explanation of what that file is and how that and document interact (if they interact).
A "global environment" refers to the root object for executing JavaScript code. You can think of this as a per-tab global object. The "settings object" is a concept that we don't have an equivalent for in Servo yet, so we'll treat it as "data that resides in the global object". WorkerGlobalScope
is the global object for a web worker; you should feel free to ignore workers at first and focus on how the specification applies to ordinary documents.
On the policy propagation piece - I'm not totally clear what has access to what.
Looking at the files, what seems to be happening is that when you click on a link that should load a new page, the document_loader 'prepare_async_load' method gets called, and that, in theory, is where things can be passed between the document and the http_loader methods (most notably, load). If I have a document with some referrer policy, the load method would need both the current document's referrer policy and the current document's URL (and maybe some other stuff) in order to properly set the referer header. Right now, load doesn't have any access to the current doc(?). Is this right?
That is correct; the desired referrer policy would need to be passed as part of the LoadData. Passing it to the PendingAsyncLoad constructor via the methods in Document that invoke the DocumentLoader code would make sense to me.
But beyond that, if I just have the referrer policy, I still cant set the header because I also need the previous document's URL to populate it - the loaddata currently just has the url to be loaded, right?
Yep. We should probably add that to LoadData as well!
Alright, so just wanna confirm I'm on the right track here: https://github.com/rebstar6/servo/commit/8c91f8abad39debdf77228669c94969865739fc0
I added Option for referrer policy and referrer URL to both PendingAsyncLoad and LoadData. From document, they get passed 1st to PendingAsyncLoad which passes them forward to LoadData (where they will be picked up by the http_loader when setting headers). With the exception of the prepare_async_load/load_async methods, these are None.
Just to confirm - document.rs has both prepare_async_load and load_async, each of which call their respective methods in document_loader...but the document_loader's load_async calls prepare_async_load. Is the prepare getting called twice? What is the point of prepare_async_load within the document class?
Yes, that looks like it's on the right track. The LoadData constructors in XMLHttpRequest and ScriptThread::start_page_load should have non-None values, but those can be dealt with later as long as there's a TODO marking them.
As for the second question, prepare doesn't get called twice for the same value. The difference between prepare_async_load and async_load is that the former returns a structure that encapsulates a network request that can be started at some point in the future, while the second initiates a network request immediately.
Alright cool. Also, where does fetch come into play (and what is it, exactly?)?
The next thing I was going to try was to set the referer header within modify_request_headers based on the policy and urls. However, unsure if this should actually go into the fetch code instead (or both if theyre separate things)?
The code in components/net/fetch/
implements the Fetch specification and is designed to replace the ad-hoc network implementation that currently exists in stuff like http_loader.rs
. It's only used in unit tests so far; we're going to switch over to using it after it's more complete. So yes, doing it in both would be most correct (and you'll need to add similar referrer data to Request
as you did LoadData
, since they don't overlap).
@SimonSapin - I mentioned this in IRC but I don't think you were on then. At some point for this issue, I will need to strip the username/password, fragment, and in some cases path and query from a URL (see https://w3c.github.io/webappsec-referrer-policy/#strip-url). I was told that you wrote (and are working on) rust-url. Do you have any suggestions about the best way to approach this? Are there changes to the crate that might make some of this easier? Any help appreciated!
@jdm - I've noticed that there are a bunch of requests on a page load where http_loader is capturing the referrer_policy/url. However, some are excluded (and tests fail as a result). For example, I added prints in http_loader for referrer_url and the url being loaded. if I do ./mach run https://www.wikipedia.org/ I get some assets receiving the referrer, and some not. Right now, I have no policy logic in place - every request should have the referrer added if there is one:
URL "https://www.wikipedia.org/" REFERRER UNKNOWN URL "https://www.wikipedia.org/portal/wikipedia.org/assets/js/abtesting-c8078be6c2.js" REFERRER: "https://www.wikipedia.org/" URL "https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia_wordmark.png" REFERRER UNKNOWN URL "https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia-logo-v2.png" REFERRER UNKNOWN URL "https://www.wikipedia.org/portal/wikipedia.org/assets/img/sprite-bookshelf_icons.png?18d8b7f58860758a154b3f1b0d329784d0f4235a" REFERRER UNKNOWN URL "https://www.wikipedia.org/portal/wikipedia.org/assets/js/index-1d96a05608.js" REFERRER: "https://www.wikipedia.org/" (... and some more, mixed with/without referrer)
Then note, if I click the 'English' link, the next URL to load has no referrer, which I think may be the cause of failing tests for me:
URL "https://en.wikipedia.org/" REFERRER UNKNOWN URL "https://en.wikipedia.org/wiki/Main_Page" REFERRER UNKNOWN URL "https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=ext.gadget.DRN-wizard%2CReferenceTooltips%2Ccharinsert%2Cfeatured-articles-links%2CrefToolbar%2Cswitcher%2Cteahouse%7Cext.tmh.thumbnail.styles%7Cext.uls.nojs%7Cext.visualEditor.desktopArticleTarget.noscript%7Cext.wikimediaBadges%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmediawiki.raggett%2CsectionAnchor%7Cmediawiki.skinning.interface%7Cskins.vector.styles&only=styles&skin=vector" REFERRER: "https://en.wikipedia.org/wiki/Main_Page" URL "https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=site&only=styles&skin=vector" REFERRER: "https://en.wikipedia.org/wiki/Main_Page" (and more again, assets within the english wiki page)
The point of this is, the referrer_url is definitely not making the transition into http_loader in all cases after this logic is in place: https://github.com/rebstar6/servo/commit/d8c06b40faa576726b405d9c2036f08b13238df6 . Any idea what's failing here? Is this maybe related to the ScriptThread::start_page_load you mentioned above?
The image loader code isn't providing a referrer because the current design of the image cache makes it impossible to do so correctly. We can ignore any failures to do with images, accordingly. The failure for the top-level Document is start_page_load, as you correctly intuited.
@rebstar6 Yes I’m the main author and maintainer of rust-url. You can find the API documentation for the Url
struct in the version of rust-url used by Servo at https://servo.github.io/rust-url/url/struct.Url.html
As you can see there, each component is stored separately. You can reset them with code like:
if let Some(relative) = url.relative_scheme_data_mut() {
// String::clear sets the length of a string to zero, effectively removing its content.
relative.username.clear();
relative.password = None;
}
url.fragment = None;
I also have PR #9840 that updates Servo to a version of rust-url where the Url
struct is completely different. But a number of thing still need to happen before it can be merged, so maybe your PR will be merged first and you won’t have to worry about it. If #9840 is merged first you’ll have to adapt your PR to the new API. In that API the Url
struct will then have a single String
internally but as a private field. To manipulate it, go through the setter methods:
url.set_fragment(None);
let _ = url.set_password(None);
let _ = url.set_username("");
Since strings are contiguous in memory, touching something in the middle that changes the length mean the rest of the string afterwards needs to be moved. For that reason it’s slightly more efficient to remove components at the end of the URL first.
let _ =
is used to ignore the returned Result<(), ()>
. This result can signal an error since some URLs like data:text/html,<p>Example
can’t have a username or password, but since you were trying to remove them anyway that’s OK. (This is by the way when the previous code would not go into the if let
block.)
Feel free to ask more questions here or on IRC. I’m in the central Europe time zone, but if you mention my nickname on IRC while I’m away I’ll still see it later and can answer then (or maybe someone else can answer), so leave your IRC client online if you can.
@jdm a few more questions...
1) Unfortunately, since the test harness must be loaded in the top level document, any code that tests navigation must rely on an iframe so the tested document doesn't replace the test harness. That being said, it may be possible to test some cases by having the framed document call a method (eg. parent.navigation_complete(document.referrer)
) instead of using postMessage
, which we don't support.
2) That's an interesting question. I'll have to get back to you on that.
Gotcha. I'm definitely to the point where having some tests to code from would be helpful. If you could give me a little more guidance on how to edit what we have (or create new?), that would be great.
I think I have the Referer header setting piece mostly in place (read: http-loader will set the referer header given the some policy and url provided by the calling document). This assumes some policy given though, I'm not yet pulling the policy from anywhere. I could, at a minimum, write some unit tests for this to verify it, but those will be moot if we can get the full html tests working.
Also, thinking in terms of eventual review, is there any way you want me split this up for PRs? I don't want it to get too big and stuck in review. (code at https://github.com/rebstar6/servo/tree/referrerPolicy)
Whoops, didn't get back to you earlier. Window.load_URL looks like a fine choke point.
In terms of tests, we can't write unit tests for network code that interacts with code in script, so the best we can do there is write tests for the network code given LoadData with certain characteristics. Those would be tests in unit/net/http_loader.rs that verify that the expected Referer header is present in the request that would be sent to the server. For those iframe WPT tests, I'll take a look and see if there's a way to copy and rewrite some of them to not timeout.
As for reviewing, if you'd like to split it up like:
That should make each part straightforward to review.
So I guess the question is, is there any point in writing some unit tests and PR-ing what I have now so I can get an early review on it?
That would include:
The unit tests, as you suggested, would actually validate what I have so far - given a policy, does http-loader set the right header?. The iframe tests that we want to edit become necessary only in the next piece, which is setting the policy as delivered by: https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-delivery
Sounds like a PR including unit tests would make sense, given the current policy settings :)
@rebstar6 Just to be sure, are you continuing to work on this, or is your project period complete?
Technically the project is over, but I'm gonna still work on it - the next week is exam time, so likely no progress, but after May 4 I plan to jump back into it. Let me know if that's ok!
Totally fine 💃
Alright I'm back! The next thing to do here is referrer policy delivery - https://w3c.github.io/webappsec-referrer-policy/#referrer-policy-delivery. I'll probably start with one, so each will be its own PR (maybe meta tag? seems most important for https://github.com/servo/servo/issues/10309#issuecomment-203929226). At this point, it becomes important that I have some way to test this. I assume the tests still can't be run due to other non-implemented items. Is there a good way to validate what I'm doing here?
I suspect it's only the iframe tests that are missing fundamental pieces right now, so I think it should be possible to rely on the attr-referrer/
and meta-referrer/
tests for the referrerpolicy
, noreferrer
and meta
delivery mechanisms respectively (excluding iframes, obviously). We don't have any support for CSP yet, so you can ignore those.
Oh, foo, looks like the tests for <link>
and <a>
also rely on postMessage
, as does <script>
. That means that <img>
, XMLHttpRequest
, and Worker
tests should work (although I'm having trouble finding any instances of the Worker ones so far). Anyways, if you focus on supporting the first two, we can figure out what to do about postMessage.
img is still giving me issues. try: ./mach test-wpt tests/wpt/web-platform-tests/referrer-policy/no-referrer/meta-referrer/same-origin/http-https/img-tag/ I'm getting timeouts with InvalidStateError. This is no-referrer, which should in theory be 'working' (as in, nothing is sent).
Ok, I figured out the img issue. Any tests that involve https will fail in Servo right now, so ignore any of the directories with http-https or https-http. The http-http ones should work, and appear to do so in my local testing. Additionally, I investigated the postMessage thing over the weekend and came up with https://github.com/jdm/servo/commit/afc7f7b18942bd6921135848291a11a11b2028a7 . This commit allows the iframe tests to pass, but is highly unlikely to be allowed to merge as-written. It should enable you to at least test the code you're writing while we come up with a more maintainable replacement, though!
Cool, started my branch off of that one. Should be helpful. 2 more testing things - I started trying to include/exclude the appropriate tests. The include.ini gets nuts when I use that approach (see gist - right now this is just 2 policies, img only. Adding the other 3 policies will be even more). Is there a better way to do this? (see https://gist.github.com/rebstar6/4026c0574e976179c8046a9042556e40)
Also, any idea why these tests only seem to have http-http and http-https? Specifically, we almost definitely want a test for https to http for the 'no referrer when downgrade' policy, since the point of that is to not send the referrer when you 'downgrade' (ie, go https to http).
I suppose if https wont work now anyway, that doesnt matter in the short term, but calling it out for later.
I recommend using skip: false
on the toplevel directory, then adding __dir__.ini
files to disable subdirectories as necessary. That's a good question about the tests; I'll raise it upstream.
Thanks! And sure, went the directory disabling route. Because of all the file nesting, were still talking a bunch of files, but oh well - https://github.com/rebstar6/servo/commit/21104c433a77d33a8340f5607c48090b4c73dc40
Now I guess I should actually get things to work.
Hrm, started, but I kinda hate this direction, so polling for suggestions. I'm just going to go ahead and point out all the issues with my approach so far (see https://github.com/rebstar6/servo/commit/e82efa0fc407220811d8770715d8297ad9fad86d)
This doesn't compile, and even if it did, things are super inefficient. I'm trying to implement this logic., ideally using methods we have already. So I start off pulling all the meta referrer elements, though that's silly (and maybe slow?) because we really only want the ones that are children of the head element. Then from there, I want to iterate though them, but I keep getting stuck in rooted/js elements and im not sure how to get out of that. As it stands, it gives the error:
"error: Expression of type std::vec::Vecdom::bindings::js::JS
Also, ideally the policy is set the 1st time get_referrer_policy is called and never again (is that even possible?). If not, it's probably best on document creation so we aren't re-running the method on each link out. Until the document is created though, I dont have use of the document methods I'm using here, which is why it lives where it lives for now.
Thoughts? Suggestions?
I'd probably recommend implementing the bind_to_tree
method for HTMLMetaElement
and making it call a method on its owning document with the associated referrer policy that it contains, if applicable. This turns it into a push notification at creation, rather than making the document poll. I'm assuming that we don't need to care about updates to the attribute later, but if we did we could use after_set_attr
for that in a similar fashion.
Striking out today: see https://pastebin.mozilla.org/8870552 Basically, this is what I'm printing when running tests/wpt/web-platform-tests/referrer-policy/origin-only/meta-referrer/same-origin/http-http/iframe-tag/generic.no-redirect.http.html
Looks like there are a few requests happening in this test. Most of them do have the referer getting set. But for the one we care about (lines 27-31), http_loader::load seems to be getting called with None/None for referrer url/policy. I have to assume there’s another load_data somewhere that's doing this. This is an iframe test - any ideas?
Actually...does an iframe call not go through window::load_url? Would make sense. That may be the issue as that was the pass-through point
Oh yeah, http://mxr.mozilla.org/servo/source/components/compositing/constellation.rs#1006 inside handle_script_loaded_url_in_iframe_msg
is something I explicitly mentioned in the original PR :)
Sorry, I'm confusing myself - http://mxr.mozilla.org/servo/source/components/script/dom/htmliframeelement.rs#151 is the proper one.
Yep, there's that TODO. Alright, welp.
Since that now lives in code that has direct access to a document, it should be straightforward to fix, at least!
Yeah much better, got myself a referer header!
One more test issue, this one should be easier:
According to https://w3c.github.io/webappsec-referrer-policy/#determine-policy-for-token, the origin when cross origin string is "origin-when-cross-origin". The tests use "origin-when-crossorigin", so this policy fails.
Can I change the tests, or is this another issue for w3c?
According to the mailing list, this was a typo in the old specification that shipped in some versions of Chrome. Let's modify the tests locally and file an issue upsteam about it as well. Would you like to do that?
Actually it looks like the Firefox developers are making the same modifications, so I guess we can race them and not bother with the upstream issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1270050
See https://w3c.github.io/webappsec-referrer-policy/
Necessary for https://github.com/servo/servo/issues/10309#issuecomment-203929226