tor2web / Tor2web

Tor2web is an HTTP proxy software that enables access to Tor Hidden Services by mean of common web browsers
https://www.tor2web.org
GNU Affero General Public License v3.0
690 stars 177 forks source link

tor2web exposes unobfuscated address stats #150

Open asn-d6 opened 9 years ago

asn-d6 commented 9 years ago

Hello,

it is my impression that tor2web exposes a list of onion addresses along with the visit count.

I wanted to mention that while this is an interesting decision on its own, it can even prove to be dangerous when visit counts are published for non-popular hidden services (with a small client anonymity set). I can imagine fictional confirmation attacks where a person is suspected to be a visitor of a website that only gets 2 or 3 visits a month, and a network attacker can correlate the time he spends on his computer with the increase in the visit count.

I suggest two things:

Feel free to adjust the constants as you fit :)

Thanks!

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/3667568-tor2web-exposes-unobfuscated-address-stats?utm_campaign=plugin&utm_content=tracker%2F318575&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F318575&utm_medium=issues&utm_source=github).
fpietrosanti commented 9 years ago

Now @evilaliv3 is on holiday, when he get back could have a look at your ticket, super thanks for noticing it!

juhanurmi commented 9 years ago

I am interested to find those hidden services that are rare. If there is a threshold ahmia.fi doesn't find those hidden services that are unpopular.

evilaliv3 commented 9 years ago

Hi all beautiful people!

I find the issue as right and the stats are controversialtro and discussed it with @hellais also

The best (ideal) would be to fetch the robots.txt for the first request of the day of an hidden service, parse it and show it only if allowed to do it. What do you think?

fpietrosanti commented 9 years ago

@evilaliv3 I'd go the opposite way. By default all nodes are indexed and present into the stats, while if the TorHS don't want to be listed in the stats, will need to make a robots.txt with appropriate entries. (The robots.txt things will be possibly needed for many other things, like search engine indexing, caching)

lastknight commented 9 years ago

Robots.txt has by default a Bot declaration. Let's just create a Bot Id and avoid if explicitly said so. Exclusions are for search engines, nor for proxies.. Il 18/ago/2014 18:11 "Fabio (naif) Pietrosanti" notifications@github.com ha scritto:

@evilaliv3 https://github.com/evilaliv3 I'd go the opposite way. By default all nodes are indexed and present into the stats, while if the TorHS don't want to be listed in the stats, will need to make a robots.txt with appropriate entries. (The robots.txt things will be possibly needed for many other things, like search engine indexing, caching)

— Reply to this email directly or view it on GitHub https://github.com/globaleaks/Tor2web-3.0/issues/150#issuecomment-52514798 .

asn-d6 commented 9 years ago

However, even if you start respecting robots.txt (which might be a good idea) you still have the problem where you display the connection count for unpopular HSes with only a few users. Also, since robots.txt disallow is opt-in (and not the default), most unpopular HSes will still be indexed by default.

So, IMO, even if you implement the robots.txt parser, you still have to slightly obfuscate the connection count, and also completely remove (the connection count of) unpopular HSes from listing. Otherwise, you are still making confirmation attacks easier.

fpietrosanti commented 9 years ago

@asn-the-goblin-slayer mmmm i see the "confirmation attack" issue.

What's about, for all TorHS that are being accessed "Less than X times", to still list them, but provide a "count" value fixed such as "Less-Than-100-Hit" ?

lastknight commented 9 years ago

How about obfuscate with a minimum LOWER ceiling? Eg if I connect once the number of connections is 50?

We will obfuscate and have indication of listing... Il 18/ago/2014 18:53 "asn-the-goblin-slayer" notifications@github.com ha scritto:

However, even if you start respecting robots.txt (which might be a good idea) you still have the problem where you display the connection count for unpopular HSes with only a few users.

So, IMO, even if you implement the robots.txt parser, you still have to slightly obfuscate the connection count, and also completely remove (the connection count of) unpopular HSes from listing. Otherwise, you are still making confirmation attacks easier.

— Reply to this email directly or view it on GitHub https://github.com/globaleaks/Tor2web-3.0/issues/150#issuecomment-52520824 .

asn-d6 commented 9 years ago

@fpietrosanti Yes, I think having a constant count value for the unpopular HSes is an acceptable solution.

@lastknight Hm. Tor and ahmia are rounding up the numbers to the nearest multiple of 8. Tor is doing this in its bridge client stats, and ahmia is doing it in its viewer stats. I think I prefer this idea over just having a minimum lower ceiling, because it adds noise to all stats.

So, 1 connection would be displayed as 8. 2 connections would be displayed as , but 9 connections would be displayed as 16. This kills information from the stats, which I think is a good idea in these cases. BTW, here is ahmia's implementation: https://github.com/juhanurmi/ahmia/commit/8898a9a82df854e09b6dbaa63687eb2a3555a11a

evilaliv3 commented 9 years ago

I do not think the constant value a good solution because the threshold is really relative. I.e. what about HS unpopular (used only by few user) but with protocols that involve a lot of connections? (I.e. torrent). So i think that only the ceiling function is somehow useful to obfuscate things