hagezi / dns-blocklists

DNS-Blocklists: For a better internet - keep the internet clean!
GNU General Public License v3.0
6.55k stars 217 forks source link

New scam blocklist #2466

Closed jarelllama closed 6 months ago

jarelllama commented 6 months ago

Hi @hagezi I recently finished adding the last bit of features to my scam blocklist. I still want to wait a week of automated builds to see if there are any issues before I make an announcement on my repo. But in the mean time, I introduced a light version of my blocklist seeing how adding the now 30,000 domains from the original list would be impractical for most of your lists.

The light version currently has 2100 domains and I was hoping you could test it in your Ultimate list?

A quick rundown on the new features:

The light list, while a lot smaller than the full list, still meets my goal of collating scam domains as close to their registration date as possible. Hopefully this makes it a practical blocklist to include. I would love to hear your thoughts!

More about the new features and the light version can be found in my readme.

hagezi commented 6 months ago

Great work!

I have optimised the size of the normal lists so that they can be used by all AdBlockers without losing effectiveness. The goal is <200000 "compressed domains". To achieve this, I have "moved" many malicious domains to the TIF medium and TIF, also to virtually eliminate duplicates between the normal and TIF lists. This does not mean that the normal lists no longer contain any TIF domains, but they do contain mainly Ads, Tracker, Telemetry and Analytics domains.

Some TIF sources vary greatly in size, sometimes from update to update, which makes it impossible to keep the normal lists stable in size. Furthermore, some domains are already dead before they even land on the list and thus inflate the lists unnecessarily.

You have seen this with your scam list, for months there were < 10000 domains and suddenly there were > 30000 domains.

Everything has been reorganised so that the user can assemble what they need from the available lists in a modular way. I made the switch after the TIF list was available in almost all DNS cloud services. For AdBlockers that can't cope with the large TIF (e.g. AdGuard iOS), I created the smaller TIF medium so that every user is able to combine the normal lists with a TIF version.

Your Light version has an acceptable size today, so I could easily add it from Normal to Ultimate without exceeding the target of 200000 domains. But, will it stay that way? Probably not, this list will grow too, right?

I know that NextDNS users have the disadvantage of not being able to use the TIF version. Which for me would be the only reason to include your Light Scam version in the normal lists and not just in the TIFs. With all other services and self-hosting, the normal lists can be combined with a TIF version of your choice. But if I start doing that, we'll soon have the huge lists from a few months ago again because there will be more requests as to why this and that source is not also included. NextDNS doesn't want to include the TIF because they have their "own feeds" and, according to them, everything is covered. If that's the case, it shouldn't be a problem. I know it's not like that, but I don't organise my lists according to a single service. Since the OISD also contains a lot of "TIF domains", an integration there might be possible @sjhgvr? Then NextDNS users could also benefit from your Light version by using the OISD.

I will add your Light to the TIF medium, the TIF will still contain the full version. This gives the TIF medium an acceptable size again, it has been properly inflated by the full version of your scam list.

jarelllama commented 6 months ago

Thanks for the insight!

Regarding the lists sizes, the influx in entries is more due to me manually adding new sources. My hopes are that when I finalize the sources used (I've searched far and wide and I think I'm running out of sources to add now) the blocklists, both full and light, would have a stable amount of domains over time thanks to the dead/parked checks.

On a typical day (without me messing around with new sources), the new domains retrieved is less than 100. The dead/parked checks also seem to remove about 100 total domains each run. So if my thinking is right, the blocklists should stay a stable size over time.

One thing to note is that unlike the full list, light does not include back resurrected/unparked domains.

Anyway, let me monitor my lists sizes more carefully now since changes to the code/sources are slower going forward, and I'll update you on my findings. Thanks for looking into it though!

Edit: also a little trick I implemented is using a rudimentary dead/parked domains cache. Whenever a domain is removed for being dead/parked, it gets "cached" into a file which is used as a filter during the retrieval process. This prevents dead domains from being added only to be removed later that day. The cache is also how my script checks for resurrected/unparked domains.

sjhgvr commented 6 months ago

@hagezi Thanks for the reminder, I've read this before (light list), but decided not to add it because I was already using these "full" lists:

https://raw.githubusercontent.com/jarelllama/Scam-Blocklist/main/adblock.txt https://raw.githubusercontent.com/jarelllama/Scam-Blocklist/main/domains.txt

Seeing those are 404 now, I'll add the working ones 👍

jarelllama commented 6 months ago

my bad @sjhgvr for not tagging you in my updates in the repo

hagezi commented 6 months ago

@jarelllama As I said, after the next release the light will be included in the TIF medium and therefore also in the TIF. The TIF will continue to contain the full version. Size jumps are not a big problem in this lists, unless the light "explodes".

@sjhgvr Thanks for your support.

jarelllama commented 6 months ago

Thanks everyone for the support 👍

jarelllama commented 6 months ago

Might as well tag @iam-py-test and @bongochong here to update them

hagezi commented 6 months ago

@sjhgvr @jarelllama

I have adjusted my recommendations in this regard.

grafik https://github.com/hagezi/dns-blocklists/wiki/FAQ#whatshouldiuse

jarelllama commented 6 months ago

great work as always @hagezi