Open fpietrosanti opened 8 years ago
i like the idea but that would be some kinda magical! may @God code it while i leave for while riding my unicorn! nice fantasy @fpietrosanti!
from my closed mind i do not understand how you expect to do it in tor2web given that:
besos!
Not entirely in accord with @evilaliv3: a similarity approach is easy to implement (and store).
See here for a couple of implementations: http://stackoverflow.com/questions/8897593/similarity-between-two-text-documents
During the first call on a blocked page is possible to calculate a text fingerprint and to compare it with a pre-approved set.
My 2€c.
@lastknight: tor2web works in streaming with a buffering of only 1k and a sliding window of 0.5k; i would not find feasible to keep a fingerprint for each 5k.
storing a fingerprint for each 5k served would require more of 2cents :)
Considering the issues described at #151, this ticket is to propose a different approach to blocking "future onion services" based on an existing "blocking pattern" in the attempt to fight cryptolockers.
But what if, for each blocked site,there would be a dump by passing it to some hashing designed to work with "similarity matching" and then to be able to have an auto-blocks for web-pages that behave to be similar more than 90% to a previously blocked page?
Without looking into the context of the content, but only of similarity of the web page fingerprint/pattern of an already blocked web page, we could be able to block new web pages.
It does require an algorithm implemented by some existing ready-made library that gives out the similarity of a website, compared to another website.
If this exists, we would be able, once blocking one crypto-locker, to block all the cryptolocker landing page of the same campaign, without entering into an arm-race of regexp'ing stuff, they change something, we regexp other stuff, etc
I don't know how complex it would be, but i think it's neat!
@virgil @evilaliv3 @moba @juhanurmi