Bloom filter privacy issues

GoogleCodeExporter commented 9 years ago

Currently the code works like this: transactions are received, their Merkle 
branches are verified. They are sent to listeners and if the listeners find 
them relevant, they are considered true positives. Otherwise they are 
considered false positives.

When building a Bloom filter, the FP rate is hard coded, so there isn't 
currently any way for a remote peer to exploit this. However at some point 
presumably that will change and the FP rate will start to be dynamic, and 
derived from the available bandwidth + how much of it is being used by FP's. 
For instance, if we scan into a part of the chain that is a lot larger than 
previous sections, then the constant FP rate as a percentage will result in a 
lot more actual FPs which might flood bandwidth, so the code should adjust the 
FP rate downwards.

However this opens up an attack: a node can maliciously include transactions in 
the filtered blocks that did not really match the Bloom filter. The node would 
see these and keep driving the FP rate downwards until it sends a filter with 
an extremely low FP rate, allowing the node to then figure out the users 
addresses/keys.

The fix is that before a filtered block is processed, we should repeat the 
filtering process on the client side (in order), testing against and adding 
transactions to the filter. If every transaction correctly matches then the 
node is not trying to cheat us.

Original issue reported on code.google.com by hearn@google.com on 6 Jan 2014 at 6:50

GoogleCodeExporter commented 9 years ago

-> enhancement, as we currently don't experience this issue due to FP rates 
being hard coded.

Original comment by hearn@google.com on 6 Jan 2014 at 6:52

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

FP rates float these days, this attack should be mitigated as part of a larger 
"make Bloom filter privacy better" project. There are many other tasks required 
too.

Original comment by mh.in.en...@gmail.com on 14 Sep 2014 at 2:35

GoogleCodeExporter commented 9 years ago

+ misc other things we've known about for a while. Can break out into separate 
bugs when someone has time to work on it:

Filters can be intersected by a peer that receives more than one of them, which 
in practice they all do (see 504). The nTweak randomisations need to be 
persisted, possibly to disk.

bitcoinj has no idea how much bandwidth it has to play with so can't do any 
kind of meaningful FP targeting. There'd need to be a bandwidth API so wallets 
could tell it wifi vs mobile vs DSL etc. Then we can try to target bandwidth 
levels.

False positives that come back aren't re-added to the filter. If they were, 
we'd have to have some clever algorithm that can figure out which false 
positives are contributing to our bandwidth overload the most, so we could take 
them out. Otherwise we end up following trails of transactions that are then no 
longer included in the next filter, allowing the remote peer to catch us in our 
lie.

Filters are sent unencrypted over the internet to randomly selected peers, more 
or less ensuring they can be tapped. The Bitcoin protocol does not have any 
encryption in it, it would have to be fixed as part of resolving that. 

None of these problems except the last should need protocol changes, but none 
of them are easy to resolve either: they're inherent to the difficult problem 
we're trying to solve of lying to our remote peers whilst doing so 
convincingly, with limited resources.

Resolution is especially hard because it's unclear how much value is obtained 
through clever usage of Bloom filters for privacy. It must be greater than 
zero, but the value of being able to link addresses to a randomly chosen node 
operator is very low. Intelligence agencies probably have a lot more interest 
in doing this at scale, which implies passive eavesdroppers are a bigger 
problem and thus that bitcoin p2p protocol encryption is probably the next 
biggest privacy win now that HD wallets are nearing completion.

Original comment by mh.in.en...@gmail.com on 14 Sep 2014 at 2:44

Changed title: Bloom filter privacy issues

GoogleCodeExporter commented 9 years ago

A paper was released (I got a chance to review it) that discusses some of the 
issues involved and covers some of the suggestions:

http://eprint.iacr.org/2014/763.pdf

Required reading for anyone who wants to work on this.

Original comment by mh.in.en...@gmail.com on 30 Sep 2014 at 1:35

novitski / bitcoinj

Bloom filter privacy issues #510