Closed wtf closed 8 years ago
thank you for pointing this feedback on your use case @obtuse
given the current basic hashed implementation this is not possible because from the hash of blahblahblahblah.onion/a/b/file1.php it would not be possible to see that blahblahblahblah.onion/a/b/file2.php should be blocked also.
this tickets already tracks this issue; feel free to contribute on that: https://github.com/globaleaks/Tor2web/issues/42
It is definitely possible. If blahblahblahblah.onion/a/
is in the blocklist, we can definitely also block blahblahblahblah.onion/a/b/
and all other child URLs. Will submit a patch for this tomorrow!
i imagine you are thinking to doing an md5 for each character of the url but this would not be feasible for the load that it will cause.
or you have something else in mind?
to solve the issue as i discussed on some other ticket we should have a different filtering approach having the md5, the kind of filter, the lenght of the string before it is hashed
anyway thank you so much for getting on this!
IMHO Supporting these only for cleartext blocklist entries seems easiest to me.
i'm not open to have dorks for critical contents hosted by tor2web nodes :)
anyway the issue would not be solved by a cleartext list cleartest or not, a flat list does not specify the kind of filter applied.
with flat i mean like now:
block_entry1
block_entry2
block_entry3
an idea i had (without adding any support database, and having in mind a backport of existing blocks to have a format like block_hash|lenght_unhashed|type
block_entry1|20|0 <- type 0 is the one existing now that checks sadly all:
fullurl/path/subdomain/ (3 checks)
block_entry2|15|1 <- type 1 is the new one suggested by @obtuse and simply
take the first 15 chars of the path, apply the filter an see if matches
but such a kind of implementation would have a linear growiung overhead and cause adding 1 entry to the list will make doing a new check for the url
let's wait that @obtuse clarify what he has in mind
I think that we could introduce a purely clear text regexp filters on URI parameter to fix in 1 shot all possibly current and future filtering requirements, so anyone that have an "more sofisticated filtering requirement" will just not have the md5-based storage property
Sent from mobile
On 23/gen/2016, at 23:52, Virgil Griffith notifications@github.com wrote:
IMHO Supporting these only for cleartext blocklist entries seems easiest to me.
— Reply to this email directly or view it on GitHub.
the solution implemented by @obtuse in https://github.com/globaleaks/Tor2web/pull/281 is brilliant; okay it does exactly what i had in mind but reduced to the / in the url and not to every single char; so it will consume a lot of more resources in hashes (one hash for each / in the url but i would like to give it try and see if it is not impacting that much.
thanks @obtuse
in addition to that i'm now going to add an additional possibility for a clearext based regexp based filter to be used if and only if the required filter is not doable using the hashed filtters.
Glad to help!
I was worried about the performance impact too, so I did a rudimentary analysis of performance on a non-idle $5 VPS:
Time required to compute 1M unique md5 hashes = 2.2s => Time required per URL (15 hashes) = ~0.033ms => URLs hashed per second = ~30k
Which is a <3.3% performance penalty while serving 1,000 requests per second.
Overall, the CPU usage of our tor2web nodes seems largely unaffected.
Great! thank for your time and effort in analyzing it @obtuse!
today i did the following:
I'm now going to update the wiki page describing an example for the regexps use
Awesome!
For a blocklist entry such as
blahblahblahblah.onion/a/b/
all of it's child URLs should also be blocked, e.g.: