tor2web / Tor2web

Tor2web is an HTTP proxy software that enables access to Tor Hidden Services by mean of common web browsers
https://www.tor2web.org
GNU Affero General Public License v3.0
700 stars 177 forks source link

Improving efficiency of checking the blocklist #286

Open virgil opened 8 years ago

virgil commented 8 years ago

This might be of interest to @evilaliv3 and @obtuse .

@evilaliv3 has expressed concern about the size of the blocklists. In particular, that as-is we have to check whether a given URL matches any entry in the blocklist, and this search computes in time O(N).

It occurred to me we could get O(1) lookup time using either a hash table or a Bloom filter. Given that harddrive space of the blocklist is probably not an issue, it seems a hash table would be the simplest solution. If harddrive space is an issue, a Bloom filter would certainly solve the issue with small chance of false positives.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/30443169-improving-efficiency-of-checking-the-blocklist?utm_campaign=plugin&utm_content=tracker%2F318575&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F318575&utm_medium=issues&utm_source=github).
fpietrosanti commented 8 years ago

I think that the blocklists will never grow up to a size that will cause performance problems. Btw i think that make sense to move the blocklist to a sqlite and:

That way the blocklist will never grow up and will have a self-cleaning system.