hagezi / dns-blocklists

DNS-Blocklists: For a better internet - keep the internet clean!
GNU General Public License v3.0
6.86k stars 227 forks source link

Using CWF API to Bulk Classify Domains #51

Closed sr093906 closed 2 years ago

sr093906 commented 2 years ago

I haven't tried it myself. So, I cannot ensure the viability.Maybe you can have a try.

https://cwf.comodo.com/subscriptions.php

The available categories are listed here https://cwf.comodo.com/categories.php

Find common entries between pro,plus version and top 1M.

Then, removing malicious ones identified by Google Safe browsing API (https://github.com/elliotwutingfeng/Inversion-DNSBL-Blocklists/blob/main/Google_hostnames.txt?raw=true) and so-called NSFW ones in https://oisd.nl/downloads.

Removing confirmed ad and tracking entries based on some sources you trust.

After that, filtering out domains with keywords such as 'sex', 'porn','adv', 'click', 'bet', "casino' and others to further reduce the amount.

And finally, using the free API to categorize the remainder.

I believe the number shall be less than 20000 ones.

hagezi commented 2 years ago

Thanks for the suggestion. I'll take a closer look.

sr093906 commented 2 years ago

Thanks for reply. Hoping it will be helpful.