mikeperry-tor / vanguards

Vanguards help guard you from getting vanned...
MIT License
197 stars 23 forks source link

Provide Linux tc scripts for delay-based location spoofing #46

Open mikeperry-tor opened 4 years ago

mikeperry-tor commented 4 years ago

I've been in a traffic analysis rabbit hole lately, and also have been pondering latency characteristics of a much faster Tor. As Tor's congestion decreases due to optimizations and utilization drops due to more capacity, circuit latency may begin to reflect actual Internet topology latency due to distances alone. After a few entry guard rotations, this feels like it will leak geographic information of the endpoint.

asn also pointed out that if we're going to try to conceal the use of onionbalance, we are also going to need to deal with timing characteristics of RP circuits built by contacting different intropoints, especially for cases with different vanguards.

I think this means we should add a pair of Linux tc scripts/components to the repo.

One script/component should allow you to add latency characteristics similar to your server's non-Tor distance from an arbitrary other site (by accumulating ping times to a site of your choice and using that data to set tc delay distrubtions).

The second script/component should let you use the delta in average latency over RP circuits to a onion site hit from specific intropoints to add delay to whichever of your onionbalance instances happen to have lower latency than the others, on average.

The first script is far easier to write, and likely far more useful. So let's focus on that for now, and see if we still want the second one.

This Linux tc summary is the best quick overview I've found so far: https://stackoverflow.com/questions/614795/simulate-delayed-and-dropped-packets-on-linux

With multiple guards, we may even want to specify different latency distributions to be added to each guard IP. This is also possible: https://serverfault.com/questions/389290/using-tc-to-delay-packets-to-only-a-single-ip-address

0xsirus commented 4 years ago

Hi, We've been recently trying to find a solution in Whonix for a similar issue regarding deanonymization of users in Tor network that can happen because of the direct effect of CPU load on latency of some network activities like ping. It's been shown that CPU load can influence the ping latency which can in turn be used to transmit data outside the anonymized context. We were also considering to use 'tc' and test how it can help resolve this problem and whether it's a good choice for this specific type of issue or not. So we might be able to benefit from your experiments with this tool (if you have already done any) possibly to come up with an effective solution to add to Whonix. Could you please tell me whether you have tried to work with 'tc' to see how well it can/cannot address the problem you explained in this ticket?

mikeperry-tor commented 4 years ago

I have used tc many many years ago for prioritizing Tor Relay traffic below other traffic on my machine, and for other network diagnostics. It is capable of doing the things I described above, but I have not yet begun work on this specific ticket.

mikeperry-tor commented 3 years ago

Note to self: There's an interesting version of this where the client or relay decides to add a per-circuit delay according to a probability distribution, such as one from the circpad probdists.

Let the record show that I am against adding delay at relays, because this causes queuing memory overhead: https://github.com/torproject/tor/blob/main/doc/HACKING/CircuitPaddingDevelopment.md#14-other-deployment-constraints

If congestion control brings queue lengths down, maybe there will suddenly be some more room for that queuing memory at relays, and maybe the oomkiller and congestion control can minimize this overhead. However, because bandwidth is still increasing exponentially, and the speed of light is fixed, I think the best thing for relays to do is add traffic, not delay packets.

Instead, a delay could be decided locally per-circuit or per-identity by the client, and thus not require relays to queue data at all. The client would just use this delay to spoof circuit latency, to conceal its geographic location.

Related ideas: https://www.freehaven.net/doc/alpha-mixing/alpha-mixing.pdf

Credit for this tip goes to "online discussion". Thanks, time traveler.