hagezi / dns-blocklists

DNS-Blocklists: For a better internet - keep the internet clean!
GNU General Public License v3.0
5.91k stars 201 forks source link

Combined list TLD's #143

Closed devipasigner closed 1 year ago

devipasigner commented 1 year ago

Missing Abused TLDs from yokoffing nextdns

.agency .ci .fun .link .live .shop .win

Thank you, I have some other personal abused TLDs, I will send them for review once I am home

devipasigner commented 1 year ago

.lol .one .online

^ From yokoffing filterlists

.email .sex .sexy .recipes .xxx .yandex .zone .software

ghost commented 1 year ago

wouldn't .xxx have too many false positives?

hagezi commented 1 year ago

Thank you, I will check later which ones lead to many false positives and which ones do not.

*.xxx is probably more for the personal block list.

hagezi commented 1 year ago

Result of the occurrence of the TLDs on the Umbrella Toplist (contains no malicious doamins):

TLD on toplist
sexy 3
recipes 8
yandex 13
ci 20
sex 35
agency 50
software 54
win 157
email 163
xxx 177
lol 220
zone 223
shop 338
fun 351
one 558
live 904
online 981
link 1241

I think one, live, online, and link should not be added.

@devipasigner @bestplayerbot: What do you think?

hagezi commented 1 year ago

Have already added the "safe" ones:

|*.sexy^
|*.recipes^
|*.ci^
|*.sex^
|*.agency^
|*.software^
|*.win^
|*.email^
devipasigner commented 1 year ago

Thank you @hagezi

almost all but 2 of the tlds excluded are from @yokoffing I haven't experienced any false positives and it has been semi effective but maybe it would be best to get an answer from @yokoffing

ghost commented 1 year ago

I've experienced some false positives with .link and .fun

yokoffing commented 1 year ago

Result of the occurrence of the TLDs on the Umbrella Toplist

@hagezi This is helpful! Thank you.

@devipasigner You can see known false positives in my filter list version of TLD protection and in Dandelion's malware list

I don't recommend blocking all these TLDs at the DNS level, but I have them there for folks who would rather tinker with setup on a regular basis. It's also why the TLD list in my NextDNS repo is split in two, whereas the filter list will warn you first but still allow the user to navigate to the site.

yokoffing commented 1 year ago

one, live, online, and link should not be added.

Spamhaus says one is 3.6% bad and online 2.6% bad, whereas live is 25% bad. link is 12.5%. (For reference, .com is 1.7%.)

yokoffing commented 1 year ago

@devipasigner

.email .sex .sexy .recipes .xxx .yandex .zone .software

Spamhaus is not everything, but one tool to take into consideration:

TLD Spamhaus % Bad Domains
com (reference) 1.7%
org (reference) 1.2%
email 4%
sex 0%
sexy 0%
recipes 0%
xxx 0%
yandex 0%
zone 11.1%
software 5.8%
yokoffing commented 1 year ago

I went through my filterlist and purged TLDs that are less than 10% bad according to Spamhaus https://github.com/yokoffing/filterlists/pull/39/commits/5d980ece1d4c936c6108acd848941e49f7c7b981. Here's what was left:

||asia^$doc
||beauty^$doc
||cn^$doc
||degree^$doc
||fit^$doc
||fyi^$doc
||garden^$doc
||live^$doc,domain=~marcello.live|~notgoogle.live
||quest^$doc
||su^$doc,domain=~kaihat.su
||shop^$doc,domain=~nsverify.shop
||surf^$doc
||zone^$doc

This cuts my list down from 43 entries to 13 entries. Note the exceptions for live, su, and shop. These may not be suitable for DNS-level blocking.

My filterlist doesn't include Dandelion's list.


The Dandelion TLDs that don't have a bunch of exceptions are:

||agency^$doc,domain=~battlefield.agency|~shortcut.agency
||bid^$doc
||cfd^$doc
||discount^$doc
||gdn^$doc
||loan^$doc
||ooo^$doc
||sbs^$doc

Note the exceptions for agency. These may not be suitable for DNS-level blocking. Some of Dandelion's TLDs fall under 10% abused; but he has researched sites beyond this basic rubric I've imposed, so I'd keep his entries.

Here's the combined list of what's left from hagezi + mine + Dandelion's. We could use this merged list as the foundation for TLD blocking (with or without the four TLDs below with site exceptions), then introduce other metrics/research to justify what other TLDs should be blocked.

||agency^$doc,domain=~battlefield.agency|~shortcut.agency
||asia^$doc
||beauty^$doc
||bid^$doc
||cfd^$doc
||cn^$doc
||degree^$doc
||discount^$doc
||fit^$doc
||fyi^$doc
||garden^$doc
||gdn^$doc
||live^$doc,domain=~marcello.live|~notgoogle.live
||loan^$doc
||ooo^$doc
||quest^$doc
||sbs^$doc
||shop^$doc,domain=~nsverify.shop
||su^$doc,domain=~kaihat.su
||surf^$doc
||zone^$doc

The following then need justification:

||associates^$doc
||bar^$doc
||best^$doc
||buzz^$doc
||cam^$doc,domain=~halide.cam
||casa^$doc
||ci^$doc
||cricket^$doc
||cyou^$doc
||date^$doc
||fun^$doc,domain=~libgen.fun|~gaggle.fun|~neal.fun
||icu^$doc
||kp^$doc
||link^$doc,domain=~reddit.app.link|~unlocked.link
||loans^$doc
||lol^$doc,domain=~kissanime.lol|~url.lol
||one^$doc,domain=~ablaze.one
||online^$doc,domain=~ero-labs.online
||recipes^$doc
||rest^$doc
||review^$doc
||ru^$doc,domain=~aliexpress.ru|~yandex.ru
||sex^$doc
||sexy^$doc
||software^$doc
||tokyo^$doc
||wang^$doc
||webcam^$doc
||win^$doc
||work^$doc,domain=~searx.work
||xxx^$doc

We could rule out three potentially (https://github.com/DandelionSprout/adfilt/issues/659#issuecomment-1284845803):

.associates: A fair bit of use among US law firms. Can't be blocked. .rest: Oddly appears to be used by some restaurants. Can't be blocked at the moment. .webcam: Rare cases of use by European road services.

devipasigner commented 1 year ago

@devipasigner

.email .sex .sexy .recipes .xxx .yandex .zone .software

Spamhaus is not everything, but one tool to take into consideration:

TLD Spamhaus % Bad Domains com (reference) 1.7% org (reference) 1.2% email 4% sex 0% sexy 0% recipes 0% xxx 0% yandex 0% zone 11.1% software 5.8%

Awesome, there are a few entries that are constantly being used to spam my emails with phishing that need to be added though. I will gather them and send to review

hagezi commented 1 year ago

@yokoffing Thanks for your work. Exceptions cannot be handled in DNS rules. I would then have to unblock the corresponding domain with all subdomains.

hagezi commented 1 year ago

Maybe we should take the list of @yokoffing as a common list. I would then parse it and convert it to |*.tld^ to fit the rules for DNS. However, I would not like to lock out whole countries like cn and ru.

yokoffing commented 1 year ago

Exceptions cannot be handled in DNS rules. I would then have to unblock the corresponding domain with all subdomains.

@hagezi Precisely. We need to be very selective as to what get blocked at the DNS-level.

A filterlist can provide a warning before navigation and still allow navigation in uBlock Origin (not sure about AdGuard); hard-blocking at the DNS-level can't. User-friendliness dictates that we are more relaxed at the DNS level (your list) and possibly(!) stricter at adblock/filterlist level (my filterlist + Dandelion's malware list). So, as far as those four TLDs with exceptions/false positives in the combined list, I wouldn't include at all at the DNS level, personally. But that's your call.

I would not like to lock out whole countries like cn and ru.

They should be removed at the DNS-level. Users can add to their personal list if they want to block them.


Referencing the combined list from earlier: if we remove those four TLDs that had exceptions, and remove cn, that leaves us with:

||asia^$doc
||beauty^$doc
||bid^$doc
||cfd^$doc
||degree^$doc
||discount^$doc
||fit^$doc
||fyi^$doc
||garden^$doc
||gdn^$doc
||loan^$doc
||ooo^$doc
||quest^$doc
||sbs^$doc
||surf^$doc
||zone^$doc

I left asia in there for now. We might should allow it at the DNS level and I keep it blocked at the filterlist level. Let me know your thoughts.

Once we agree on a common list, we can look at the 'needs justification' list.

For anyone wondering: I haven't forgot about my NextDNS repo. I just want to wait and clean that up after we get this sorted.

devipasigner commented 1 year ago

Went through my emails with over 500 pages of spam and phishing and the most abused by far are:

.ml
.sbs
.cf
.site
.online
.gq
.ga
.tk
.top
.in
.fun (a bunch of nsfw phishing redirects from google docs pdf)
.xyz (too many false positives)

Personally I think these must be included

yokoffing commented 1 year ago

@devipasigner Thanks for doing that!

Good

.sbs is already included in our combined list above.

Too many false positives

Possibly appropriate for soft-blocking by filter list, but not hard-blocking via DNS:

Probably not appropriate but unsure

Let me know your thoughts.


I moved ~.online~ .fun back to my filterlist to soft-block, and added .in as a new entry. .cf, .ga, .gq are blocked by Dandelion's Malware list and have exceptions already there. I wouldn't block these at DNS-level either, though.

Dynasty-Dev commented 1 year ago

Hello all! Just tuning in here to share some ideas, perhaps there could be 2 versions of the list, a strict and more balanced list, kind of like the system @yokoffing already has going on, but perhaps more optimized. They would probably be called 'light' and 'pro'. Why? Sure there may be some false positives caused by these tlds but thats the nature of tld and regex blocking.. of course we should still offer a balanced one. But some of the ones that have a few false positives are sites not many people will probably ever go on but will block a lot of the phishing sites. More good than bad. Personally, I've never got a hit with any of the 'balanced' ones, but got many hits with the ones reported here that have a minuscule amount of false positives.

Dynasty-Dev commented 1 year ago

@devipasigner Thanks for doing that!

Good

.sbs is already included in our combined list above.

Too many false positives

* `.top` Too many false positives and probably more: https://github.com/DandelionSprout/adfilt/blob/2cbcfff6e62f6fff817a64390810188ab8903c08/Dandelion%20Sprout's%20Anti-Malware%20List.txt#L24

* same for `.ml` https://github.com/DandelionSprout/adfilt/blob/2cbcfff6e62f6fff817a64390810188ab8903c08/Dandelion%20Sprout's%20Anti-Malware%20List.txt#L15-L16

* and `.tk` https://github.com/DandelionSprout/adfilt/blob/2cbcfff6e62f6fff817a64390810188ab8903c08/Dandelion%20Sprout's%20Anti-Malware%20List.txt#L11-L12

Possibly appropriate for soft-blocking by filter list, but not hard-blocking via DNS:

* `.online` One false positive noted but there may be others after we research further. I do recall there being a phishing used with this TLD last year. It is used legitimately for manga and video streaming sites, so I wouldn't block it entirely.

* `.fun` also has legitimate uses https://github.com/yokoffing/filterlists/blob/8cd755ed08bd300222b37d6f8b71b492fe6eae36/enhanced_site_protection.txt#L25

* `.cf`, `.ga`, `.gq` are on my hardened NextDNS list and they break third-party video streaming sites and have other unintended breakage (subrequests on pages). https://github.com/DandelionSprout/adfilt/blob/2cbcfff6e62f6fff817a64390810188ab8903c08/Dandelion%20Sprout's%20Anti-Malware%20List.txt#L14-L20

Probably not appropriate but unsure

* `.site` has legitimate uses (Google, Fmovies, sports, etc.)

* `.in` is the domain for India. @hagezi doesn't want to block country domains

Let me know your thoughts.

I moved .online .fun back to my filterlist to soft-block, and added .in as a new entry. https://github.com/yokoffing/filterlists/blob/0cf52963af4ae322771d30620bb3542b1c7813c2/enhanced_site_protection.txt

.cf, .ga, .gq are blocked by Dandelion's Malware list and have exceptions already there. I wouldn't block these at DNS-level either, though.

Hey! I'm just joining in , but from what I see a lot of those listed for only soft blocking have very minimal false positives. Lots of those are used for malware redirects especially in emails and streaming sites and I think it is needed. Now like I wrote above, the 2 solutions would be: 1. Make 2 different lists, one light, one 'aggressive' but I wouldn't even call it that. 2. Make one big list with false positives constantly whitelisted using top lists and dandelions small selection already made

devipasigner commented 1 year ago

@devipasigner Thanks for doing that!

Good

.sbs is already included in our combined list above.

Too many false positives

* `.top` Too many false positives and probably more: https://github.com/DandelionSprout/adfilt/blob/2cbcfff6e62f6fff817a64390810188ab8903c08/Dandelion%20Sprout's%20Anti-Malware%20List.txt#L24

* same for `.ml` https://github.com/DandelionSprout/adfilt/blob/2cbcfff6e62f6fff817a64390810188ab8903c08/Dandelion%20Sprout's%20Anti-Malware%20List.txt#L15-L16

* and `.tk` https://github.com/DandelionSprout/adfilt/blob/2cbcfff6e62f6fff817a64390810188ab8903c08/Dandelion%20Sprout's%20Anti-Malware%20List.txt#L11-L12

Possibly appropriate for soft-blocking by filter list, but not hard-blocking via DNS:

* `.online` One false positive noted but there may be others after we research further. I do recall there being a phishing used with this TLD last year. It is used legitimately for manga and video streaming sites, so I wouldn't block it entirely.

* `.fun` also has legitimate uses https://github.com/yokoffing/filterlists/blob/8cd755ed08bd300222b37d6f8b71b492fe6eae36/enhanced_site_protection.txt#L25

* `.cf`, `.ga`, `.gq` are on my hardened NextDNS list and they break third-party video streaming sites and have other unintended breakage (subrequests on pages). https://github.com/DandelionSprout/adfilt/blob/2cbcfff6e62f6fff817a64390810188ab8903c08/Dandelion%20Sprout's%20Anti-Malware%20List.txt#L14-L20

@yokoffing thank you for the feedback, personally I think these entries are very much needed despite having a couple of false positives. There are millions of phishing domains using these TLD's and only a couple of legitimate domains using these TLDs that a regular user will ever visit. I think theres just way too many malware and phishing domains using these TLDs to allow it, even if it means whitelisting 1/2 domains. Let me know what you think, I simply think the good extremely outweights the bad. @Dynasty-Dev proposed are pretty good solution however I really don't think they would be too far from each other in terms of false positives/aggressiveness.

yokoffing commented 1 year ago
  1. Make 2 different lists, one light, one 'aggressive' but I wouldn't even call it that.
  2. Make one big list with false positives constantly whitelisted using top lists and dandelions small selection already made

@Dynasty-Dev That's already happening at the filterlist level. But this cannot be done at the DNS-level. You have to manually allowlist all the false positives -- which is fine for a personal setup, but impractical when you're making a list for many people. You're going to run into breakage.

personally I think these entries are very much needed despite having a couple of false positives. There are millions of phishing domains using these TLD's and only a couple of legitimate domains using these TLDs that a regular user will ever visit.

@devipasigner I just want to avoid a scenario where someone is having to allowlist something every week. I may make some concessions and get feedback from users over time. Let me think on it.

Adblocking side isn't too bad with uBlock Origin since you can bypass the block and report the false positive later. It's DNS blocking that can be very frustrating.

hagezi commented 1 year ago

So friends, now you've hung up on me - I'm old! :D Before I now no longer know what remains on the TLD list and what not, I'll sleep a night over it. :)

Dynasty-Dev commented 1 year ago
  1. Make 2 different lists, one light, one 'aggressive' but I wouldn't even call it that. 2. Make one big list with false positives constantly whitelisted using top lists and dandelions small selection already made

@Dynasty-Dev That's already happening at the filterlist level. But this cannot be done at the DNS-level. You have to manually allowlist all the false positives -- which is fine for a personal setup, but impractical when you're making a list for many people. You're going to run into breakage.

personally I think these entries are very much needed despite having a couple of false positives. There are millions of phishing domains using these TLD's and only a couple of legitimate domains using these TLDs that a regular user will ever visit.

@devipasigner I just want to avoid a scenario where someone is having to allowlist something every week. I may make some concessions and get feedback from users over time. Let me think on it.

Adblocking side isn't too bad with uBlock Origin since you can bypass the block and report the false positive later. It's DNS blocking that can be very frustrating.

I understand! And I respect and love all the work you've done. Almost every NextDNS user has a well balanced profile because of you. (i personally use it too). However I think the amount of false positives from these TLDs is being very exaggerated. Ive personally used Hagezi's tld blocklist plus my own entries for 2 different households for over a year (own tld entries before hagezi and you existed) and I haven't gotten any false positive reports from them (only from blocklists). I personally use a lot of those free streaming sites for movies and sports (nba, football) too. As for using browser extensions, I agree, they should be applied everywhere however there are censorious where it won't help (for example, email apps on mobile (huge source of phishing and spam). Now the average techie will know its a scam, however the same can't be said about kids, wifes, grandparents

yokoffing commented 1 year ago

I think theres just way too many malware and phishing domains using these TLDs to allow it, even if it means whitelisting 1/2 domains.

@devipasigner That sounds amazing --- until you're the list maintainer 🙃

Now the average techie will know its a scam, however the same can't be said about kids, wifes, grandparents

@Dynasty-Dev Those are the ones that are the loudest when a random site doesn't work


Went through my emails with over 500 pages of spam and phishing and the most abused by far are:

.ml
.sbs
.cf
.site
.online
.gq
.ga
.tk
.top
.in
.fun (a bunch of nsfw phishing redirects from google docs pdf)

We'll give it a go.

I have added/restored the ones listed above that not already in Dandelion's list, with what exceptions that I'm aware of https://github.com/yokoffing/filterlists/pull/39/commits/168e83d6fe1797d693f582beb600d83348f435b9. Edit: Accidentally added .top and now removed, since Dandelion actively covers that https://github.com/yokoffing/filterlists/pull/39/commits/a02e70dc0a0317514271384f3c5c5947cf69a51b

This will block top-site navigations and not break sub-requests. Unlike blocking these at the DNS level, I don't anticipate many false positives.

Pull Request: https://github.com/yokoffing/filterlists/pull/39 yokoffing Enhanced Protection List: https://github.com/yokoffing/filterlists/blob/main/enhanced_site_protection.txt Dandelion Sprout's Anti-Malware List: https://github.com/DandelionSprout/adfilt/blob/master/Dandelion%20Sprout's%20Anti-Malware%20List.txt


When we have the space to discuss the rest of these domains, I'll push the pull request through. Keep an eye on it until then.

ghost commented 1 year ago

found a false postive for *.fun yesterday https://github.com/yokoffing/filterlists/pull/39#issuecomment-1371689096

hagezi commented 1 year ago

So, I might do the following with my list:

I will take the top 10 Spamhaus TLDs as before, meaning every TLD that has been in the top 10 since the list was created will end up on the list. Exceptions below.

I additionally take over the TLDs from yokoffing by parsing his list. I will implement the exceptions with the denyallow modifier, example: |*.agency^$denyallow=battlefield.agency|shortcut.agency

Furthermore TLDs can be added manually of course.

I will exclude the following TLDs from the overall list:

Country specific TLDs, like:

cn
ru
in
co
uk
de

Other TLDs:

info
com
net
org
io
me
xyz

What do you think about such a solution?

yokoffing commented 1 year ago

Looks good. Here's what the 'combined list' looks like now (mine + Dande + @devipasigner's research on what should be restored), with comments. Please review: https://github.com/yokoffing/filterlists/blob/cd71a3ce16cad7717880394d2fa9e7d41711aa26/enhanced_site_protection.txt#L9-L62

@hagezi Now, here's the combined list with: 1) comment lines removed 2) cn ru in removed 3) alphabatized

||agency^$doc,domain=~battlefield.agency|~shortcut.agency
||asia^$doc
||beauty^$doc
||bid^$doc
||cf^$doc,domain=~google.cf|~rths.cf|~voitures.cf|~assembleenationale-rca.cf|~cps-rca.cf|~acap.cf|~miraculousladybug.cf|~scrat.cf
||cfd^$doc
||degree^$doc
||discount^$doc
||fit^$doc
||fun^$doc,domain=~libgen.fun|~gaggle.fun|~neal.fun|~bestgore.fun
||fyi^$doc
||ga^$doc,domain=~google.ga|~filtri-dns.ga|~dgdi.ga|~voitures.ga|~economie-gabon.ga|~9191.ga|~animevsub.ga
||garden^$doc
||gdn^$doc
||gq^$doc,domain=~deimos.gq|~inege.gq|~tvgelive.gq|~comprarcarros.gq
||live^$doc,domain=~marcello.live
||loan^$doc
||ml^$doc,domain=~google.ml|~mobili.ml|~melody.ml|~dcod.ml|~info-matin.ml|~amap.ml|~mastodon.ml|~worproject.ml|~nothingprivate.ml|~lingva.ml|~lemmy.ml|~bittor.ml|~noic.ml|~beatbump.ml|~gymlibrary.ml|~animevsub.ml|~prompt.ml|~biblioreads.ml
||monster^$doc,domain=~egybest.monster|~yts.monster|~cloudcdn.monster|~fedi.monster
||online^$doc,domain=~ero-labs.online|~amedia.online|~allhen.online|~chainsaw-man-manga.online|~manga1st.online
||ooo^$doc
||pw^$doc,domain=~libgen.pw|~petridish.pw|~palaugov.pw|~dpc.pw|~zikrap.pw|~demonoid.pw|~bittor.pw|~buttercup.pw|~rezka.pw|~darkcrystal.pw|~xor.pw|~fullhdfilmizlesene.pw|~b00k.pw|~gopass.pw|~vost.pw
||quest^$doc
||sbs^$doc
||shop^$doc,domain=~nsverify.shop
||site^$doc,domain=~business.site|~anitube.site|~wuxiaworld.site|~notube.site|~fmoviesto.site|~secreto.site|~metrolagu.site|~betphoenix.site|~cdmstudy.site
||su^$doc,domain=~kaihat.su
||surf^$doc
||tk^$doc,domain=~coolcmd.tk|~budterence.tk|~google.tk|~transportnews.tk|~c0d3c.tk|~anonytext.tk|~tokelau-info.tk|~fakaofo.tk|~loljp-wiki.tk|~ninetail.tk|~goshujin.tk|~graph.tk|~dls2.pokeacer.tk|~nolfrevival.tk|~coppersurfer.tk|~restricted-functions.tk|~bstweaker.tk|~nbd-media.tk|~glypx-pakhsh-nakon.tk|~gotofap.tk|~somepythonthings.tk
||top^$doc,domain=~corriente.top|~gdtot.top|~nicenature.top|~reminder.top|~magocoro.top|~castlevania.top|~suiten.top|~shucks.top|~1stream.top|~ambr.top|~techblog.top|~changlam10.top|~changlam11.top|~pdcdn1.top|~mastodon.top|~pressplay.top
||zone^$doc
hagezi commented 1 year ago

Thanks @yokoffing, looks good!

hagezi commented 1 year ago

DNS Syntax AdGuard Home, add to custom filtering rules for testing:

|*.agency^$denyallow=battlefield.agency|shortcut.agency
|*.asia^
|*.beauty^
|*.bid^
|*.cf^$denyallow=google.cf|rths.cf|voitures.cf|assembleenationale-rca.cf|cps-rca.cf|acap.cf|miraculousladybug.cf|scrat.cf
|*.cfd^
|*.degree^
|*.discount^
|*.fit^
|*.fun^$denyallow=libgen.fun|gaggle.fun|neal.fun|bestgore.fun
|*.fyi^
|*.ga^$denyallow=google.ga|filtri-dns.ga|dgdi.ga|voitures.ga|economie-gabon.ga|9191.ga|animevsub.ga
|*.garden^
|*.gdn^
|*.gq^$denyallow=deimos.gq|inege.gq|tvgelive.gq|comprarcarros.gq
|*.live^$denyallow=marcello.live
|*.loan^
|*.ml^$denyallow=google.ml|mobili.ml|melody.ml|dcod.ml|info-matin.ml|amap.ml|mastodon.ml|worproject.ml|nothingprivate.ml|lingva.ml|lemmy.ml|bittor.ml|noic.ml|beatbump.ml|gymlibrary.ml|animevsub.ml|prompt.ml|biblioreads.ml
|*.monster^$denyallow=egybest.monster|yts.monster|cloudcdn.monster|fedi.monster
|*.online^$denyallow=ero-labs.online|amedia.online|allhen.online|chainsaw-man-manga.online|manga1st.online
|*.ooo^
|*.pw^$denyallow=libgen.pw|petridish.pw|palaugov.pw|dpc.pw|zikrap.pw|demonoid.pw|bittor.pw|buttercup.pw|rezka.pw|darkcrystal.pw|xor.pw|fullhdfilmizlesene.pw|b00k.pw|gopass.pw|vost.pw
|*.quest^
|*.sbs^
|*.shop^$denyallow=nsverify.shop
|*.site^$denyallow=business.site|anitube.site|wuxiaworld.site|notube.site|fmoviesto.site|secreto.site|metrolagu.site|betphoenix.site|cdmstudy.site
|*.su^$denyallow=kaihat.su
|*.surf^
|*.tk^$denyallow=coolcmd.tk|budterence.tk|google.tk|transportnews.tk|c0d3c.tk|anonytext.tk|tokelau-info.tk|fakaofo.tk|loljp-wiki.tk|ninetail.tk|goshujin.tk|graph.tk|dls2.pokeacer.tk|nolfrevival.tk|coppersurfer.tk|restricted-functions.tk|bstweaker.tk|nbd-media.tk|glypx-pakhsh-nakon.tk|gotofap.tk|somepythonthings.tk
|*.top^$denyallow=corriente.top|gdtot.top|nicenature.top|reminder.top|magocoro.top|castlevania.top|suiten.top|shucks.top|1stream.top|ambr.top|techblog.top|changlam10.top|changlam11.top|pdcdn1.top|mastodon.top|pressplay.top
|*.zone^

@yokoffing new Exception for pw : core.pw

yokoffing commented 1 year ago

Exception for pw : core.pw

@hagezi Thanks for the heads up. What is that? I can't get the site to load.

.pw is in Dande's list. Open a pull request with him. Done

yokoffing commented 1 year ago

How do we feel about restoring .loans (plural) to the list since we're already blocking .loan (singular)?

hagezi commented 1 year ago

@hagezi Thanks for the heads up. What is that? I can't get the site to load.

Needed for Wargaming Games

https://myip.ms/view/hosts/2920936/core_pw.html

yokoffing commented 1 year ago

I've further refined the Spamhaus section. The 10% rule was just a starting point. Spamhaus is not Bible, and they are not the "be all, end all" of the cybersecurity world but I like to reference their metrics. Lastly, note that this isn't touching the TLDs provided by Dandelion and @devipasigner.

I took into account whether these TLDs receive low or high traffic and whether the TLDs are used by few or many sites. Then I visited popular (if any were in the top 100k or 1M sites) and random sites to see firsthand if we should bother to keep them.

Total of 3 or 4 = stay Total of 1 or 2 = remove

TLD Total Spamhaus 10 Most Abused TLDs Used by low-traffic or low-quality sites Few Domains with TLD in existence Many scammy rando sites (random searches)
asia 0
beauty 4 x x x x
degree 3 x x x
fit 3 x x x
fyi 1 x
garden 1 x
live 1 x
quest 3 x x x
shop 1 x
su 1 x
surf 2 x x
zone 3 x x x

@hagezi Therefore, we can remove the following:

asia
fyi
garden
live
shop
su

For the Spamhaus TLDS that remain, here's new false positives I encountered while checking these manually:

||beauty^$doc,domain=~vipbj.beauty
||degree^$doc,domain=~opf.degree|~three60.degree
||fit^$doc,domain=~appetit.fit|~clubb.fit|~pridegym.fit|~justget.fit
||quest^$doc,domain=~0x00.quest|~prednisonetab.quest
||surf^$doc,domain=~surfstation.surf|~kayaking.surf|~quran.surf|~s-wings.surf
||zone^$doc,domain=~typinggames.zone|~martech.zone|~kinogo.zone|~kidtopia.zone|~itsm.zone
hagezi commented 1 year ago

Wow, great work!

ghost commented 1 year ago

why is *.kp blocked it's the TLD for north korea

hagezi commented 1 year ago

This will no longer be the case in the new list. When @yokoffing is finished with his list, I will parse it daily and convert it into DNS format. I will exclude country-specific TLDs. I will also not add any entries manually, yokoffing's list will be the only basis.

hagezi commented 1 year ago

New list is live; https://raw.githubusercontent.com/hagezi/dns-blocklists/main/adblock/spam-tlds.txt

Merged lists: Spamhaus Top10, @yokoffing @DandelionSprout

Excluded TLDs: .cn .ru .in

Please check ...

iam-py-test commented 1 year ago

Yay, thanks! Just FYI:

|*.xxx^

Blocks some "legit" sites (i.e. https://www.virustotal.com/gui/url/d587c31002021e6c3c081f137f3a4b9dd28514501d37c26bec4ea58e38f7a1fe/detection). Not sites that I'd visit, but some people might complain.

yokoffing commented 1 year ago

@hagezi Run the script again. I just pushed a pull request from yesterday that comments out TLDs we haven't discussed (e.g., .icu, and to @iam-py-test's point, .xxx). The version you have did not do this.

This will provide us a nice core, foundation list 😄

hagezi commented 1 year ago

Thanks @yokoffing, updated, Please check ...

https://github.com/hagezi/dns-blocklists/commit/9b5984e22debc48f0b903d80faf14c3302fd7509

yokoffing commented 1 year ago

@hagezi .yandex is on there. Just found that one and removed from my list. That was my bad. Otherwise, the list is beautiful!

@iam-py-test Thanks for joining in! I've been wanting to ask you: Can you explain more on why you block .win and .cricket in your list? Thank you ahead of time for your insight.

hagezi commented 1 year ago

Thanks @yokoffing, updated. Thanks for joining @iam-py-test!

yokoffing commented 1 year ago

@hagezi I know syntax will change based on format of your list (e..g, domain, hosts, etc.), but for the Adblock list, what is the reasoning to change from ||tld^$doc,domain= to |*.tld^$denyallow=? The latter can't be recognized by uBlock Origin.

Screenshot 2023-01-06 104929

iam-py-test commented 1 year ago

@iam-py-test Thanks for joining in! I've been wanting to ask you: Can you explain more on why you block .win and .cricket in your list? Thank you ahead of time for your insight.

Honestly, that list is very poorly thought out. I just copied a ton of random TLDs from somewhere else (I don't remember where), and then over time removed some that had known FPs reported to other lists. I have put pretty much 0 work into that list, so I wouldn't put much creditably into it.

iam-py-test commented 1 year ago

@hagezi I know syntax will change based on format of your list (e..g, domain, hosts, etc.), but for the Adblock list, what is the reasoning to change from ||tld^$doc,domain= to |*.tld^$denyallow=? The latter can't be recognized by uBlock Origin.

I think that might be because $denyallow requires $domain (https://github.com/DandelionSprout/adfilt/blob/master/Wiki/SyntaxMeaningsThatAreActuallyHumanReadable.md#blocking-1)

hagezi commented 1 year ago

@yokoffing The reason was that the AdGuard host compiler ignores short rules. The recommended workaround from the AdGuard team was to switch to |*.TLD^. I just wanted to make the list compatible with the host compiler, but I can also go back to the previous sytnax ||TLD^. I have no problem with this.

yokoffing commented 1 year ago

Honestly, that list is very poorly thought out

@iam-py-test Thank you for your honesty! 🤣

hagezi commented 1 year ago

@iam-py-test denyallow should also work with the normal syntax ||TLD^: https://github.com/AdguardTeam/AdGuardHome/wiki/Hosts-Blocklists#denyallow

iam-py-test commented 1 year ago

@iam-py-test denyallow should also work with the normal syntax ||TLD^: https://github.com/AdguardTeam/AdGuardHome/wiki/Hosts-Blocklists#denyallow

Not in uBlock Origin: https://github.com/gorhill/uBlock/wiki/Static-filter-syntax#denyallow image Odd that uBo and AGH have different rules, although I guess it makes sense for a DNS blocker

DandelionSprout commented 1 year ago

$denyallow in uBO requires being coupled with $domain, for some reason.