privacytests / privacytests.org

Source code for privacytests.org. Includes browser testing code and site rendering.
https://privacytests.org
MIT License
836 stars 24 forks source link

List upstream browser versions #223

Open tomByrer opened 2 weeks ago

tomByrer commented 2 weeks ago

I almost grabbed Mulvad, but noticed that they are using a 14 month old Firefox version. which to me may be security issue, which in turn can affect privacy, or the ability to use certain webapps that test on only 'evergreen' versions. Also, having a browser that is older than "Extended Service Release" can be a fingerprint in itself.

To help readers decide if that is a risk they want to endure, would an upstream version be done?

Thorin-Oakenpants commented 1 week ago

MB has a nightly channel

PieroV commented 1 week ago

I almost grabbed Mulvad, but noticed that they are using a 14 month old Firefox version. which to me may be security issue,

This is misleading to say the least. Mullvad Browser is on the ESR 115 train, which is still supported (albeit only for a month). The last version of the Firefox 115.x series is coming out today, and we've already rebased an update, which is more or less ready to be built (we're a little bit late with the build this time due to the Labor Day in the US). ESR channels have a 4 weeks release cadence like other Firefox channels for the most important security fixes. In addition to that, sometimes we backport security fixes Mozilla didn't backport.

which in turn can affect privacy,

We evaluate also backporting privacy fixes, and we contribute ourselves with new fixes that often arrive to Mullvad Browser and Tor Browser first.

or the ability to use certain webapps that test on only 'evergreen' versions.

That's true. But webapps should be also blamed for this. E.g., Element refused to support ESR just to be able to use Intl.Segmenter in May-June and ignored the requests of many uses. Going to Firefox Rapid Release isn't a trivial challenge.

Also, having a browser that is older than "Extended Service Release" can be a fingerprint in itself.

Every user agent has a fingerprint. Someone saying otherwise is a liar. Some features make Mullvad Browser already easily fingerprintable as Mullvad Browser. At that point the upstream version will tell only whether you are on the release channel or in the alpha/nightly channel. I don't have numbers, but I guess there are very few alpha users. So, if you've been already identified as a Mullvad Browser user, the fact that you re still in 115.x will give only slightly more than 0 bits of information.

What you really want to avoid is a unique fingerprint. The objective of Mullvad Browser (and Tor Browser) is to reduce the unique fingerprints users can get, i.e., make big crowd of users where users can hide in.

uazo commented 1 week ago

What you really want to avoid is a unique fingerprint. make big crowd of users where users can hide in.

I enter the discussion, I apologise.

This point is still difficult for me to understand: what is the difference between ‘unique fingerprint’ and ‘always different fingerprint’? which is better? this is a big doubt to which I do not know the answer.

allow me to explain myself. tor browser attempts to decrease the number of possible fingerprints by flattening them out, but in the android version it fails, for example, with regard to screen size. this behaviour is not specific to tor but is common in almost all browsers. in this way the browser then exposes a smaller number of usable values to sites for fingerprinting between different API, being identical, but increases the possibility of tracking precisely because it decreases the number of available factors, leaving easily (or even not) traceable values uncovered.

since, I imagine, the fingerprint value is used in conjunction with user behaviour (or the network, or whatever is theoretically unknown), decreasing the former increases the latter, things not under the direct responsibility of the browser.

Now, with this reading, the approach of always making different fingerprint values seems better, especially considering that the APIs to be exploited for fingerprinting may not yet be known. so that factors not known to the browser but exploited for fingerprinting are mixed with factors deliberately made as random as possible.

Consider that I often read in the chromium documentation that it is better to expose 0 ‘zero’ rather than a random value, but I still don't understand why, perhaps because I am not a mathematician.

so, still in this reading, I hypothesise that the best mechanism is to flatten the differences and from that, maintain a unique fingerprint. I would like to understand in your opinion (as well as @arthuredelstein) where my reasoning goes wrong

Thorin-Oakenpants commented 1 week ago

I enter the discussion

The goal of anti-fingerprinting is to hide the real value - it doesn't matter how. There are two considerations - compat and fooling naive scripts. So you have lowering entropy (make everyone the same per equivalency) and randomizing.

A naive script doesn't detect randomizing, so each time (per execution or better, per session per eTLD+1 etc) it swallows the poison pill(s) and generates an overall unique fingerprint (assuming the randomizing is strong enough) - meaning every session you visit a site your fingerprints won't be matched.

But all randomizing can be detected - either via math (e.g. known pixel tests in canvas), or via third parties. It can also sometimes be reversed - (e.g. in Brave, and it's been a year since I looked at anything, removing leading, trailing and multiple spaces in userAgent, or ignoring the hardcoded list of random fonts - don't confuse this with the fact that Brave is also protecting the underlying values) So a randomized value can be recorded as a static value, e.g. 'random' (as shown on coveryourtracks), and thus becomes the same as lowering entropy.

Tor Browser (MB) never had an engineering solution to create a seed for persistent randomizing: i.e per eTDL+1 and scheme and window (normal, private, tor window). Brave added one at the very start. Firefox now has one. So Brave used this to randomize canvas, but subtle so the human eyeball can't tell and sites are not broken. Tor Browser, because it was returning an all white canvas anyway, i.e totally useless, choose to totally randomize it per execution, because why not (and they have no seed). They also don't want to allow any possible averaging.

So that's it : threat model (Tor Browser always assumes the worst), compat, engineering. e.g. since Brave engineered a seed, they were able to add more things to it to be randomized, because, why not. The more items you randomize, the more chances a script is fooled and becomes naive.

Also note that randomizing comes with costs and complexity and downsides (see averaging) and I could point you to numerous bugs on Brave and Mozilla re their implementations just for canvas. So Tor Browser would prefer to not go this route unless they have to

So, at the end of the day, you protect each value (doesn't really matter how), and you aim to protect enough values that is becomes too hard, costly or performant to track your [1] users via fingerprinting. The other side of the equation is that you need a crowd [1]. And the more you grow that crowd, the more users hopefully in most/each "fingerprint" (i.e with advanced scripts that detect randomizing)

Protect the real value. Protect more metrics. Grow the crowd.

[1] You can't hide that you are on Tor Browser or Brave


Android screen metrics - android lacks a lot of fingerprint protection parity with desktop Tor Browser due to Android TB being late to the party and the nature of the device (e.g. the browser window fills the screen) - but the screen is still somewhat protected. Android is not a good example, it's just such a different beast.

PieroV commented 1 week ago

Sorry, I started a reply but then jumped on a meeting. What thorin said basically, but I'll add also my answer anyway :smile:.

If I remember correctly, Firefox RFP and derivatives (including Tor Browser and Mullvad Browser) randomize only when it isn't possible to do otherwise (e.g., <canvas>es).

I said unique, but there isn't a single way to compute fingerprints. There are some values that you can measure, and then you compute a fingerprint from them in the way you prefer.

Standardizing values is better than keeping original values and adding some random noise to them because you can repeat measurements to remove the errors (so, if you lie with a random value, you should be smart, e.g., lie consistently in the entire session to prevent finding the actual value). As Thorin said, averaging the value is a very easy (but sometimes extremely effective, depending on the random model) to remove the noise, but there many more ways, also very refined (I'm not an expert, but I guess estimation methods from control engineering and other disciplines could be adapted to fingerprinting).

uazo commented 1 week ago

Protect the real value. Protect more metrics. Grow the crowd.

the objective is very clear and the catchphrase is very nice! But it's all in what “protect” means... I am aware of how it does brave, being able to read (more or less) the code, only by hearsay in reference to tor.

But all randomizing can be detected ... It can also sometimes be reversed

That's why I say start with identical values and proceed to randomization so that reversing a random value by taking it to a standard value becomes useless. Of course, assuming that reversal is possible.

what I hypothesize is that both tor's and brave's models are potentially ineffective, the former because it elevates the user factor, the latter because the possible inversion (or average) exposes the true values. mind you, I am not saying at all that I have found the way, I am simply hypothesizing that the best is the combination of the two.

But in any case to say “to avoid a unique fingerprint” without context does not seem correct to me, and leads people to think that one model is better than another a priori, which technically it doesn't seem to be (to me :)

Thorin-Oakenpants commented 1 week ago

But in any case to say “to avoid a unique fingerprint” without context

The whole point of FPing is to create uniqueness. Anyway, this thread is now off-topic and arthur is not amused :)