mozilla / lightbeam

Orignal unmaintained version of the Lightbeam extension. See lightbeam-we for the new one which works in modern versions of Firefox.
https://github.com/mozilla/lightbeam-we
Mozilla Public License 2.0
587 stars 149 forks source link

Following a link to another site always indicates a behavioral tracking relationship #6

Closed toolness closed 10 years ago

toolness commented 13 years ago

Clicking on the "privacychoice.org" link from collusion.toolness.org, for instance, connects the two domains on the collusion diagram and claims that privacychoice.org tracks your behavior across toolness.org.

All this means is that privacychoice.org sets a cookie, and that it knows I came to it from collusion.toolness.org. While they could use this for "tracking", it's far less insidious than what we consider behavioral tracking, particularly since the user has to actually click on a link on a specific page for the linked-to site to know where the user came from. Including e.g. an image or script to the tracking site on every page of one's site is much more akin to what we normally consider tracking to be.

gerv commented 13 years ago

Yes; I think it should only include third-party resource-inclusion relationships in the diagram, not navigation.

Gerv

sidstamm commented 13 years ago

Agreed, and perhaps we can expand it from third-party-resource-with-cookie to all-third-party-resources. Sites don't necessarily need to use cookies to track you, they can use data from the referer, their querystring, or their URL to track you too. Maybe that's a separate bug...

dethe commented 11 years ago

If we don't add a link when a user navigates to a site it will substantially change how we display the graph, I think it's the right thing to do, but not sure what the implications are for showing Collusion data.

toolness commented 11 years ago

Regarding being able to regression test this in an automated way: when we wrote Jetpack, we had a number of helper classes/libraries that created a new tab/browser window and did things with it. Here's one example, from the test suite for the tab-browser module: https://github.com/mozilla/addon-sdk/blob/master/packages/api-utils/tests/test-tab-browser.js

We might be able to copy over some of those utilities and actually automate opening a browser tab, visiting a website, and clicking a link in it. The last time I worker on Jetpack, we didn't have a built-in localhost webserver as part of the test suite, but I bet there's one in there now--and if there's not, it isn't that hard (uh, relatively speaking) to set one up... We could then have multiple localhost webservers come up on different ports, which could represent different first-party and third-party websites, and the tests could open tabs in a browser that visit those sites.

This is, of course, a very integration-level test rather than a unit test, but ultimately it's probably pretty useful, since Firefox's implementation changes often enough that the integration tests will warn us about breakages as early as possible.

dethe commented 11 years ago

Yes, Shane says there is a built-in webserver, and that tests are best if they use that instead of hitting actual sites so the response is predictable and repeatable. I'll build tests using the built-in server.

jonoxia commented 11 years ago

I'm going to try comparing the domain and referrer to the active tab's history stack -- if you clicked a link, both the referrer and the referee should show up in history. That may help us distinguish link-clicks from tracking relationships.

There will be some timing issues to solve with this method, and it won't help for right click -> "open link in new tab", but it might be a start...

jonoxia commented 11 years ago

Yeah, timing problems. At the time when we decide whether to record a tracking link or not, the new site I'm navigating to has not yet been added to history.

This is the same problem I ran into when trying to check whether a cookie had really been set or not -- we record the tracking link on http-on-examine-response, which is too early to see the "results" (cookies set, pages loaded) of the http request, so there's a lot of data that's not available.

One workaround would be that in our http-on-examine-response handler, we add referrers/domains to a provisional queue, and then examine the queue later to decide if they represent tracking or not. That adds a lot of complexity to the code, though.

Another approach would be to see if there's an event that happens later in the loading process that we could iisten for instead of http-on-examine-response.

dethe commented 11 years ago

Talking with Shane, he suggested either nsIWebProgress or nsIContentPolicy could help.

Another option: we could just add the link, but then when we add the node as visited, remove any links showing it as a tracker. That way, spurious links should at least be short-lived.

--Dethe

jonoxia commented 11 years ago

we could just add the link, but then when we add the node as visited, remove any links showing it as a tracker.

I was just thinking about that. Would be pretty easy to do, if that's the behavior we want. I guess the question is: Is there any situation where a connection should be shown as a tracking link even though you also happened to navigate there?

dethe commented 11 years ago

I was just thinking about that. Would be pretty easy to do, if that's the behavior we want. I guess the question is: Is there any situation where a connection should be shown as a tracking link even though you also happened to navigate there?

If so, it would be a much more minor bug than the current behaviour.

jonoxia commented 11 years ago

I did a first pass at this. I sent a pull request. See https://github.com/mozilla/collusion/pull/108 for details.

dethe commented 11 years ago

Jono: This is exciting, but after pulling your request and following the suggested test (navigate to Mozilla.org, then click on twitter link) I still see the link between Mozilla and Twitter, running using "cfx run". I've checked, and my local instance is up to date with the master branch, so I'm not sure what's going on. Have you synched with the master branch when testing?

jonoxia commented 11 years ago

Hm, it works for me after pulling mozilla master.

Just to check that I'm not doing something wrong with git, I did a fresh checkout of the mozilla/collusion repo. It still works for me. I did a diff between my original working directory and the code I just checked out. They're the same.

This has to be down to some difference between our environments. Is the code giving you any new error messages? If you put: console.log(previousDomain + "->" + domain); on line 508 of main.js, does it print out something that looks correct when you click a link to another domain?

--Jono

dethe commented 11 years ago

Hi, sorry to be so slow to respond, it's been Thanksgiving here.

Also it may be important to note which platforms we're testing on. I'm running Firefox 15.0.1 on OS X 10.8.2

I pulled to have the latest addons-sdk, then did a clean git clone of the collusion code. What I'm seeing is interesting:

When I visit mozilla.org as the first site, nothing shows up in Collusion at all. When I click on Twitter, only twitter and its trackers show up.

I tried livingcode.org (my site), which only has one tracker in collusion (google analytics). When I click through a link to goodreads.com, I see a tracker link between them where there shouldn't be one.

When I visit a site with a lot of trackers, like slashdot or boingboing, the links start showing up disconnected from anything, then join up with the site I navigated to (which creates a nice springing action).

Having links show up from link navigation appears to happen some of the time, but not other times, but overall more often than not.

That's what I was able to observe on a clean build, hopefully it's not the turkey talking.

--Dethe

On 2012-10-08, at 1:02 PM, jonoxia notifications@github.com wrote:

Hm, it works for me after pulling mozilla master.

Just to check that I'm not doing something wrong with git, I did a fresh checkout of the mozilla/collusion repo. It still works for me. I did a diff between my original working directory and the code I just checked out. They're the same.

This has to be down to some difference between our environments. Is the code giving you any new error messages? If you put: console.log(previousDomain + "->" + domain); on line 508 of main.js, does it print out something that looks correct when you click a link to another domain?

--Jono — Reply to this email directly or view it on GitHub.

jonoxia commented 11 years ago

OK, I figured out what's the problem. Load livingcode.org, click link to goodreads.com, wait for it to load, then open collusion: No link appears. Load livingcode.org, click link to goodreads.com, go to collusion tab before page load is complete: link appears. It's only detecting a user navigation event if the page load is happening in the currently focused tab. It's because the getHistoryForActiveTab() function I wrote only looks at the active tab; it should look at the history of the tab where the load event occured.

jonoxia commented 11 years ago

After the last pull request it should be removing the link even if the page load happened in a background tab. However, if the link was already in the graph and then you navigate from site A to site B, that link won't go away immediately - it will go away only when you reload the collusion page. This is because the code in graphrunner.js that translates referrer json to nodes and links is only capable of adding links, not removing them. I am working on an edit that will allow it to dynamically remove links when the properties in the json change.

jonoxia commented 11 years ago

Dethe: the latest changesets in my pull request should fix the problem you were seeing. I've been testing it on livingcode.org and the link to goodreads.com. If you load livingcode.org, then open the collusion tab and watch the graph, then click the link to goodreads.com, you will see a line appear between the nodes temporarily; but once the load finishes, the graph will realize that was a user navigation, and will remove the line.

dethe commented 11 years ago

Awesome! I'm just getting caught up from my whirlwind trip to California, will try to pull and test the patches later today.

dethe commented 11 years ago

This issue is still occurring sometimes, see Issue #113.

dethe commented 11 years ago

Looking into issue #113 I see what's going on. Links are always opened in a new tab, but we are only checking the last two links in an existing tab to see if they are connected. We really need a better way to detect user interaction.

I think a better test is whether the URL loaded was ever the URL of a tab, as opposed to something loaded in the page that never became the window.location. This won't help with iframes, but will distinguish between user loaded pages and other assets loaded by the page.

We can also distinguish between domain assets bearing cookies and page loads with separate icons (diamond vs. circle?). That way arrows will only point away from loaded pages, rather than having some arrows point towards a page and some pointing away.

dethe commented 11 years ago

This is fixed in branch c2_fresh_start, which will soon become the new master branch.

monicachew commented 10 years ago

Closing based on dethe's comment.