haraka / Haraka

A fast, highly extensible, and event driven SMTP server
https://haraka.github.io
MIT License
5.06k stars 661 forks source link

domain reputation DB #696

Closed msimerson closed 8 years ago

msimerson commented 10 years ago

Along with URIBL, WoT would make another excellent check against incoming email.

smfreegard commented 10 years ago

It would - except I can't see that the data is publicly available?

msimerson commented 10 years ago

Scroll down to the bottom of that page and click the Developers link to find the API info page.

smfreegard commented 9 years ago

Forgot to mention that I looked at this when after you posted the above. Personally - I think the T&Cs are a bit of a showstopper:

In addition to the WOT Terms of Service, the following restrictions apply to using the API:

The API is free for individuals and non-commercial use.
You must not make more than 25000 API requests during any 24 hour period.
You must limit your request rate to at most 10 requests per second.
If you are unable to comply with these restrictions, please contact us regarding partnership or commercial offering.

Furthermore, if you use the API in your application, we recommend the following:
You should credit WOT in your application. You can use our badges, for example.
You should not request the same information more than once during any 30 minute period, but should use a local cache for repeated requests instead.

I manually sampled some data and I don't really think the work involved for this would be worth it e.g. it didn't find anything for the URIs that I tried.

msimerson commented 9 years ago

Maybe WoT isn't quite the right tool. In much of the current spam, the domains are disposable, populated with "all the right DNS" (SPF, FCrDNS, helo hostname, etc), and then some amount of time elapses (so the domains fall off the "newly observed domains" lists (nod, sem-fresh, etc.), and then the campaigns begin. Another way to detect these disposable domains is to perform a Google search. In nearly every case, these disposable domains have zero matches. I can't think of many real-world cases where anyone would want to receive email from a domain with no Google visibility. Thoughts?

msimerson commented 9 years ago

PhishTank is focused exclusively on phishing but it has a downloading database, making it a viable source of data to check URLs against.

msimerson commented 9 years ago

Another one is Spam404. I don't see a data URL, but it appears it'd be pretty easy to scrape the web pages and maintain a local copy of the domain list.

msimerson commented 9 years ago

Artists against 419 provides their DB via SOAP, so it could be sucked into a local DB and queried against as well.

smfreegard commented 9 years ago

PhishTank data is included in SURBL IIRC.

msimerson commented 9 years ago

PhishTank data is included in SURBL IIRC

And ClamAV UNOFFICIAL.

msimerson commented 9 years ago

for future reference: https://github.com/jpf/domain-profiler

baudehlo commented 8 years ago

This ticket seems redundant. I'm going to close - re-open if you think it's important.