tor2web / Tor2web

Tor2web is an HTTP proxy software that enables access to Tor Hidden Services by mean of common web browsers
https://www.tor2web.org
GNU Affero General Public License v3.0
705 stars 176 forks source link

Also block children URLs with no trailing slash #296

Open wtf opened 8 years ago

wtf commented 8 years ago

Currently, blocklisting blahblahblahblah.onion/a/ blocks blahblahblahblah.onion/a/b/c/ but not blahblahblahblah.onion/a/b/c. At least some websites default to no-trailing-slashes and may also redirect there from trailing slashed URLs.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32567799-also-block-children-urls-with-no-trailing-slash?utm_campaign=plugin&utm_content=tracker%2F318575&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F318575&utm_medium=issues&utm_source=github).
evilaliv3 commented 8 years ago

This could be valuable but we have to rethink how the blocklist is handled.

Right know we have flat lists that do not describe what kind of check has to be done on the hash; This causes that on all the hashes we have to apply all the checks and what you are proposing here mixed with the current implementation would require to do O(n) hashed with n the length of the url.

the possible solution to this is described here: https://github.com/globaleaks/Tor2web/issues/280#issuecomment-174238795

wtf commented 8 years ago

Since we already split the incoming URL into parent URLs and compare those with our block list, removing trailing slashes would double the amount of work, but it would still be much better than O(n).