privacy-tech-lab / privacy-pioneer

Privacy browser extension for analyzing web traffic of visited websites
https://www.privacytechlab.org/
Other
26 stars 1 forks source link

Browse other privacy-related extensions #42

Closed rgoldstein01 closed 3 years ago

rgoldstein01 commented 3 years ago

Just documenting the top privacy extensions and their functionalities:

Privacy Badger: Main function: stops advertisers and other third-party trackers from secretly tracking where you go and what pages you look at on the web. If an advertiser seems to be tracking you across multiple websites without your permission, Privacy Badger automatically blocks that advertiser from loading any more content in your browser. To the advertiser, it’s like you suddenly disappeared Technical level: At a more technical level, Privacy Badger keeps note of the “third party” domains that embed images, scripts and advertising in the pages you visit. Privacy Badger looks for tracking techniques like uniquely identifying cookies, local storage “supercookies,” first to third party cookie sharing via image pixels, and canvas fingerprinting. If it observes a single third-party host tracking you on three separate sites, Privacy Badger will automatically disallow content from that third-party tracker.

Bottom line: This is a great tool that seems to focus mostly on local storage and cookie stuff that are all within the browser level. Most likely, it is not looking at anything related to HTTPS requests. So, while this is a great tool, our analysis of http requests would likely be different than this.

ublock origin Main function: "general purpose" blocker Technical: Uses AdbLockPlus filters (below) and allows user to make own custom filters and block lists.

Bottom line: This is another good tool but once again is just blocking things, not really giving much other info to the user of the site. Since a lot of it is custom, it seems you really have to do your own research for what to block. Probably not to useful to the casual internet user.

Adblock Plus Main function: Blocks adds Technical: Basically just filter lists (pre-made or made by their own users for public use) that block speciifc links and ads.

Bottom line: Similar to ublock. Helpful but once again not doing anything with HTTP requests or information to user about specific data being taken from you and to where. Also, of course, there is the issue of Acceptable Ads.

Blur Main function: Allows you to hide sensitive info on web page Technical: Website doesn't really document anything.

Bottom line: This is a pretty cool extension I never knew about. Unfortunately does not have much technical info about how its made, but i assume its mostly just statically analyzing the webpages looking for keywords to then blur areas for in data tables. In general, pretty different from what we are interested in working on, as its really only concerned with local experiences for an individual user, nothing related to third parties.

HTTPS Everywhere Main function: extension that encrypts your communications with many major websites, making your browsing more secure. Technical: Uses regex to create rules that are publically maintained in their codebase (https://github.com/EFForg/https-everywhere/blob/master/CONTRIBUTING.md) to rewrite HTTP requests into HTTPS

Bottom line: THis should be of some interest to us as it is dealing directly with these requests. It does not seem to care as much about what is actually inside the requests themselves, but nevertheless is serving as a proxy in the users browser to intercept the requests, change them into HTTPS, and then resend them. If we go this route, it could be very useful info in their repo.

*DuckDuckGo Privacy Essentials Main function: blocks hidden trackers, encyrpted versions of sites, gives privacy grade Technical: Unsure, most likely just static analysis of webpages.

Bottom line: A lot going on here with different functionalities but again, doesn't seem to be as interested in the HTTPS requests, although I am interested to see if that at all goes into their privacy grade (I assume it does at least in some way). For the most part this just seems to block third party trackers. Good tool but nothing novel.

Click&Clean Main function: Deletes typed URLs, Cache, Cookies, your Download and Browsing History. Technical: COnnects to browser settings

Bottom line: This mostly just streamlines what a user already can do but in multiple clicks and going through browser settings. Should be helpful to inexperienced web browsers. Cool tool, not really relevant to us.

Behave! Main function: Monitors web pages you go to to check for privacy concerns like Browser based Port Scan, Access to Private IPs, DNS Rebinding attacks to Private IPs Technical: No clue. Above my pay grade.

Bottom line: Another cool tool, this one actually having to do with HTTPs. However, this is all about security aspects of the requests and the websites, something I think we're less interested in.

Last I came across Privacy Manager, which does not appear ot be popular (only 169 ratings) but similar to something we are doing. Here's the link: https://chrome.google.com/webstore/detail/privacy-manager/giccehglhacakcfemddmfhdkahamfcmd?hl=en-US It allows you to manage a ton of different privacy settings for a web page, along with maintaining your browser history and cookies, deleting them, etc. In addition to that, it has the following functionality: Network monitoring - User also can manage network traffic in popup window - collect http headers and also block user agent data to be sent via request, to use that feature you need to allow optional host permissions.

So, while nothing is automated for the request analysis, it does give the user the ability to look at the headers of the requests, something similar to what we were thinking about. So I am not sure this is exactly what we were looking for, but it is pretty similar.

End notes: All in all, there are a lot of cool extensions out there, but nothing that is actually looking at the contents of the HTTP requests. Perhaps this is because it's not useful (which would make it a waste of time for us) OR perhaps it is because it just has yet to be done. If the latter is the truth, it is certainly something we could focus on, as it is clear it is not happening with these other popular extensions. However, it is certainly possible it is just not something that is of use to users of web pages so that is why it is not happening. For example, when I use firefox the browser itself already tells me if a website is using my location or some other privacy related. I don't think an extension saying that my location is then also being sent to the website or other site after I told fierfox it is OK to use my location is very beneficial. Just something more to think about..

SebastianZimmeck commented 3 years ago

Excellent, survey @rgoldstein01. Maybe, you should make that into a blog post or something similar.

All in all, there are a lot of cool extensions out there, but nothing that is actually looking at the contents of the HTTP requests. Perhaps this is because it's not useful (which would make it a waste of time for us) OR perhaps it is because it just has yet to be done.

I tend to think that it would be less useful for form data or other data items the user is aware of. If I am entering my email address on a website, I know that the site will have my email address. On the other hand, I would not necessarily expect that my email address would be shared with a third party, which could be observed via HTTP requests. Similarly, if I allow my browser to give a site access to my location, it would not be a surprise that the site is from now on collecting my location data. But third parties getting my location are a different story. So, I think there are use cases for analyzing the content of HTTP requests.

Beyond finding the use case, the privacy analysis of HTTP requests is also harder than just, say, identifying third parties on a site, identifying cookies, or blocking ads or trackers. We would need to come up with (and are already in the process) of making sense of the requests, which techniques to use to analyze them, what to look for, ... I think that is one of the reasons why it is not being done yet. We would need to get an understanding of the content-level. What is included in an HTTP request (cookie, etc.), what is a third party doing, as opposed to just saying that there is an HTTP request going to a site, a cookie, and so on.

For example, when I use firefox the browser itself already tells me if a website is using my location or some other privacy related.

A lot of functionality that used to be in extensions is now moving into the browser itself, particularly, if you are looking at Brave and Firefox. They already have blocking included, fingerprinting, and other privacy-related functionality. So, we would need to provide something that is not already in browsers or will be coming to browsers soon (e.g., complete phasing out of third party cookies).

Also, what about Puppeteer and other similar "expert" tools? They also have some of the functionality we are exploring, right?

notowen333 commented 3 years ago

I looked into Puppeteer. It is an alternate architecture for automated browsing built out in Node.js specifically for chromium browsers. It can also be used to script non-automated browsing. Some interesting functionalities include communicating with DevTools and emulating mobile devices. It's a robust tool with many use cases. (There's also something called "Playwright" which is not limited to chromium. This is backed by Microsoft instead of Google but is very similar and built out by the same engineers).

So for example, running this script:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://twitter.com');
  await page.screenshot({path: 'twitter.png'});

  await browser.close();
})();

Saves the following .png in the folder where I ran the script

twitter

If we wanted to emulate our current architecture using puppeteer, we could do so like this:

const browser = await puppeteer.launch({headless: false});

This launches a full browser that can be scripted with. It looks like it is possible to perform HTTP interception with Puppeteer, but I was having trouble getting it working with other functionalities. I'm having trouble making this work with other functionality of the API, but it's definitely possible.

Here's another snippet of what you can do with Puppeteer:

const puppeteer = require('puppeteer');

(async() =>
    {
        const browser = await puppeteer.launch({});
        const page = browser.newPage();
        const headers = (await (await page).goto('https://gap.com')).headers();
        const cookies = (await page).cookies();
        console.log(headers);
        console.log(await cookies);
        browser.close();
    })();

Which produces:

owenkaplan@new-host-3 puppet % node testing.js
{
  'content-type': 'text/html; charset=utf-8',
  'cache-control': 'max-age=0, no-cache',
  server: 'nginx/1.17.10',
  'x-cache-status': 'MISS\nMISS',
  'x-frame-options': 'DENY',
  'x-page-speed': '1.13.35.2-0',
  'x-powered-by': 'Express',
  'x-request-host': 'www.gap.com',
  'x-vcap-request-id': '2d06777d-bb92-4e09-401f-9ca29da90568',
  'x-e-dc': 'azeus',
  'x-akamai-transformed': '9 - 0 pmb=mTOE,1mRUM,2',
  vary: 'Accept-Encoding',
  'content-encoding': 'gzip',
  date: 'Tue, 05 Jan 2021 00:34:28 GMT',
  'set-cookie': 'prlb=e; path=/; domain=.gap.com\n' +
    'akacd_p-gol=3787259667~rv=25~id=80f50ea06e7f1bf5d7ad9d1881f1055b; path=/;; Secure; SameSite=None\n' +
    'bm_sz=CAF59871413FE714A029B18F4D998302~YAAQTaomF4F/s7t2AQAAW3D3zwoyAvRPjjp5O9psOGGrQuVlUc+7j9xNwcnEc1hd6Dk3/YeKOdm5EM5a4fD9imsIidxXC8Z9TmwINshzd3OKGyNgt/1xpa7EnfTJAyiJDqiMKAFxWa7Ge47yb96T+78JF8JO3j3Hng0XoytRuyJDNjYhZ0ln2WZWmwyP; Domain=.gap.com; Path=/; Expires=Tue, 05 Jan 2021 04:34:28 GMT; Max-Age=14400; HttpOnly\n' +
    '_abck=653CF52878F813A002E5FFE0E1AC837F~-1~YAAQTaomF4J/s7t2AQAAW3D3zwX7W2U2SrxQCyekrsv/cPJu+zrxopqTjiAMfRkv2VvNDlDG+mZQxo2QsFNwZBNhkeKaLtBCOfP4iznttc8gSG+CcmEJ43PA8tVZ8Z/E+nJvvfJnGIzGc0QWk3nUrAbbZcCTUaMzQkv+5K0hxjo3Jdtq7WKdLLkeZcngiZML0fVieUyNIQ3tqMkg9+T6WYxOzH4bTOqFrbXH8mqFpOMBtDQp4ErxR0EicFUDFxDk2V5B3+qmIA3yJqkfZpVC1qkWBH8ddlfqj5IpXcjTLs5lczkASOAv~-1~-1~-1; Domain=.gap.com; Path=/; Expires=Wed, 05 Jan 2022 00:34:28 GMT; Max-Age=31536000; Secure',
  'server-timing': 'cdn-cache; desc=HIT\nedge; dur=1',
  'x-akam-sw-version': '0.5.0',
  'strict-transport-security': 'max-age=604800 ; includeSubDomains'
}
[
  {
    name: 's-prlb',
    value: 'e',
    domain: '.gap.com',
    path: '/',
    expires: -1,
    size: 7,
    httpOnly: false,
    secure: false,
    session: true
  },
  {
    name: 'gid.h',
    value: '200400001|||',
    domain: '.gap.com',
    path: '/',
    expires: -1,
    size: 17,
    httpOnly: true,
    secure: true,
    session: true
  },
  {
    name: 'gidAccessToken',
    value: '|||',
    domain: '.www.gap.com',
    path: '/',
    expires: -1,
    size: 17,
    httpOnly: true,
    secure: true,
    session: true
  },
  {
    name: 'JSESSIONID',
    value: 'c4b705ad-c18a-4b46-9770-68b422c8d67b',
    domain: '.gap.com',
    path: '/',
    expires: -1,
    size: 46,
    httpOnly: true,
    secure: true,
    session: true
  },
  {
    name: 'RT',
    value: '"z=1&dm=gap.com&si=9aqnh647c7m&ss=kjj9g1kd&sl=0&tt=0"',
    domain: '.gap.com',
    path: '/',
    expires: 1610411668,
    size: 55,
    httpOnly: false,
    secure: false,
    session: false
  },
  {
    name: 'unknownShopperId',
    value: '4DCCA305F9C3D24FA3D15C544E75980D|||',
    domain: '.gap.com',
    path: '/',
    expires: 1672878869.179669,
    size: 51,
    httpOnly: false,
    secure: false,
    session: false
  },
  {
    name: 'bm_sz',
    value: 'CAF59871413FE714A029B18F4D998302~YAAQTaomF4F/s7t2AQAAW3D3zwoyAvRPjjp5O9psOGGrQuVlUc+7j9xNwcnEc1hd6Dk3/YeKOdm5EM5a4fD9imsIidxXC8Z9TmwINshzd3OKGyNgt/1xpa7EnfTJAyiJDqiMKAFxWa7Ge47yb96T+78JF8JO3j3Hng0XoytRuyJDNjYhZ0ln2WZWmwyP',
    domain: '.gap.com',
    path: '/',
    expires: 1609821268.57897,
    size: 226,
    httpOnly: true,
    secure: false,
    session: false
  },
  {
    name: 'gidGuestSecureToken',
    value: '|||',
    domain: '.www.gap.com',
    path: '/',
    expires: -1,
    size: 22,
    httpOnly: true,
    secure: true,
    session: true
  },
  {
    name: 'locale',
    value: 'en_US|||',
    domain: '.gap.com',
    path: '/',
    expires: 1616027669.179805,
    size: 14,
    httpOnly: false,
    secure: false,
    session: false
  },
  {
    name: 'gidSecureToken',
    value: '|||',
    domain: '.www.gap.com',
    path: '/',
    expires: -1,
    size: 17,
    httpOnly: true,
    secure: true,
    session: true
  },
  {
    name: 'ABSeg',
    value: '"{}"',
    domain: '.gap.com',
    path: '/',
    expires: 1609835669.179792,
    size: 9,
    httpOnly: false,
    secure: false,
    session: false
  },
  {
    name: '_abck',
    value: '653CF52878F813A002E5FFE0E1AC837F~-1~YAAQTaomF4h/s7t2AQAAT3L3zwVh0CmJbvwnUDYHHyuxHmEVc2YLsHMt8saBTMqlUrSWzVB6/cjRABVzJhoEBd0T29e4IQNrVYCZEWm3f8Qag6xy4B50p/Jem7CFf1rLV4HadYluKp8Zu1Lsp362VfJKEzaEsCJwn8p6+ozSdrEESz1wui0aSv6BSM2uBNsh9Ms4Y6LAik6zG49l1zqiW2vExmJ6V+NbXmbirsubiD1awZiZQg4Znp97aaNx6dXbxI3G2AIWU3l3OJ1Q6PquV3fI2wjSzTrCnlf/hsRpj0cnVRhL5JeY51S1Jps/s9qSTtadavU=~-1~-1~-1',
    domain: '.gap.com',
    path: '/',
    expires: 1641342869.076536,
    size: 378,
    httpOnly: false,
    secure: true,
    session: false
  },
  {
    name: 'akacd_p-gol',
    value: '3787259667~rv=25~id=80f50ea06e7f1bf5d7ad9d1881f1055b',
    domain: 'www.gap.com',
    path: '/',
    expires: -1,
    size: 63,
    httpOnly: false,
    secure: true,
    session: true,
    sameSite: 'None'
  },
  {
    name: 'prlb',
    value: 'e',
    domain: '.gap.com',
    path: '/',
    expires: -1,
    size: 5,
    httpOnly: false,
    secure: false,
    session: true
  }
]

In summary, puppeteer could be a framework for an alternate/supplementary approach. One large advantage is a pretty neat set-up/deployment. A pretty diverse set of event listeners can be configured to achieve various goals we have. For example, (https://pptr.dev/#?product=Puppeteer&version=v5.5.0&show=api-class-securitydetails).

SebastianZimmeck commented 3 years ago

In summary, puppeteer could be a framework for an alternate/supplementary approach

We've been there before, haha. So, I am not sure about using the Puppeteer architecture. But rather the question is whether Puppeteer is already doing everything we have in mind (probably not, but what is the delta?).

SebastianZimmeck commented 3 years ago

My understanding at this point is that Puppeteer as such does not provide the analysis functionality out of the box that we may want but that we could use it as an alternative starting point instead of Selenium.

notowen333 commented 3 years ago

Right, exactly. Puppeteer is an api to "talk to and automate actions in a chromium browser." It is not an analysis tool out of the box.

davebaraka commented 3 years ago

Just to add a couple of points

All in all, there are a lot of cool extensions out there, but nothing that is actually looking at the contents of the HTTP requests.

Browser extensions are limited in terms of reading the response data. We can read the post data, but when it comes to the HTML, css, js, or other data, there are two clear ways of getting this data outlined here. One way requires the dev tools to be opened the whole time, which would be a poor experience to a developer or user, and the other uses javascript injection, which I'm not to sure about its reliability of capturing all the requests.

All in all, browser extension technology "doesn't really" support this type of analysis.

My understanding at this point is that Puppeteer as such does not provide the analysis functionality out of the box that we may want but that we could use it as an alternative starting point instead of Selenium.

I forget the reason why we chose Puppeteer over Selenium, I think there was a certain functionality of analyzing the requests that we couldn't get working in Puppeteer. One benefit of puppeteer was that we could add listeners and analyze requests and other things on the fly. I think currently with selenium, we capture the data every second or so, which isn't really a problem if we are sure we are getting all the data. One worry I have with selenium is how will it look/behave when packaged. I can foresee ways that this could be difficult or result in a poor user experience, but I could be wrong.

SebastianZimmeck commented 3 years ago

Browser extensions are limited in terms of reading the response data.

Especially, If we decide to go for a user product, perhaps, analyzing the response data is not necessary. If we need it, though, whether for a user or developer product, the extension route may not be the best. Not sure about alternatives.

I forget the reason why we chose Puppeteer over Selenium

Selenium over Puppeteer, right?

One worry I have with selenium is how will it look/behave when packaged. I can foresee ways that this could be difficult or result in a poor user experience, but I could be wrong.

Yes, I am not sure what other choices we have beyond browser extensions and Puppeteer. For example, Electron or a web proxy may also not very usable.

Independently from the architecture, the bottom line at this point seems to me that our core analysis functionality is making sense of HTTP requests. Per @rgoldstein01's survey, it seems that most (all?) user tools do not cover this functionality. So, maybe have tool that tells a user which data is going where based on HTTP requests, e.g., location data to ad network x, interest in politics to Facebook via tracking pixel, etc.

A developer tool is still an option as well.

davebaraka commented 3 years ago

Selenium over Puppeteer, right?

Sorry, yes.

So, maybe have tool that tells a user which data is going where based on HTTP requests, e.g., location data to ad network x, interest in politics to Facebook via tracking pixel, etc.

A browser extension would suffice for this type of analysis.

SebastianZimmeck commented 3 years ago

Closing this for the time being. Good analysis, @rgoldstein01.