Closed msimerson closed 8 years ago
It would - except I can't see that the data is publicly available?
Scroll down to the bottom of that page and click the Developers link to find the API info page.
Forgot to mention that I looked at this when after you posted the above. Personally - I think the T&Cs are a bit of a showstopper:
In addition to the WOT Terms of Service, the following restrictions apply to using the API:
The API is free for individuals and non-commercial use.
You must not make more than 25000 API requests during any 24 hour period.
You must limit your request rate to at most 10 requests per second.
If you are unable to comply with these restrictions, please contact us regarding partnership or commercial offering.
Furthermore, if you use the API in your application, we recommend the following:
You should credit WOT in your application. You can use our badges, for example.
You should not request the same information more than once during any 30 minute period, but should use a local cache for repeated requests instead.
I manually sampled some data and I don't really think the work involved for this would be worth it e.g. it didn't find anything for the URIs that I tried.
Maybe WoT isn't quite the right tool. In much of the current spam, the domains are disposable, populated with "all the right DNS" (SPF, FCrDNS, helo hostname, etc), and then some amount of time elapses (so the domains fall off the "newly observed domains" lists (nod, sem-fresh, etc.), and then the campaigns begin. Another way to detect these disposable domains is to perform a Google search. In nearly every case, these disposable domains have zero matches. I can't think of many real-world cases where anyone would want to receive email from a domain with no Google visibility. Thoughts?
PhishTank is focused exclusively on phishing but it has a downloading database, making it a viable source of data to check URLs against.
Another one is Spam404. I don't see a data URL, but it appears it'd be pretty easy to scrape the web pages and maintain a local copy of the domain list.
Artists against 419 provides their DB via SOAP, so it could be sucked into a local DB and queried against as well.
PhishTank data is included in SURBL IIRC.
PhishTank data is included in SURBL IIRC
And ClamAV UNOFFICIAL.
for future reference: https://github.com/jpf/domain-profiler
This ticket seems redundant. I'm going to close - re-open if you think it's important.
Along with URIBL, WoT would make another excellent check against incoming email.