lightswitch05 / hosts

Hostfile blocklist for ads and tracking, updated regularly
https://www.github.developerdan.com/hosts/
Apache License 2.0
1.5k stars 75 forks source link

Issue 393 🔥 Remove subdomains when there's already domain.tld being blocked #394

Closed thomasmerz closed 1 year ago

thomasmerz commented 1 year ago

This PR solves Issue #393 :

This makes adlist smaller (bad for marketing): 341776 fewer lines and 89336 lines left, and much more performant (nice for admins and users).

lightswitch05 commented 1 year ago

Please refer to https://github.com/lightswitch05/hosts/issues/327#issuecomment-995944871

One note on this change is that the Pi-Hole, which I use and made this list for, does not automatically block subdomains for blocked roots, and so this change would be negative for pihole users.

thomasmerz commented 1 year ago

I'm using also a Pi-hole and it blocks also all (random) subdomains for any block domain.tld in any blocklist:

grafik

But I can and will ask the Pi-hole people how Pi-hole should work in this case for verification/clarification because I really would never want to break Pi-hole 💘

thomasmerz commented 1 year ago

By linking an older issue of me do you mean that you have to decline this issue and my PR (in progress) due to your "management system built for your lists where it is all in a database and subdomains are discovered and added automatically"? 😞 but true…?

hagezi commented 1 year ago

I'm using also a Pi-hole and it blocks also all (random) subdomains for any block domain.tld in any blocklist

Since when can Pi-hole do that?

thomasmerz commented 1 year ago

I'm using also a Pi-hole and it blocks also all (random) subdomains for any block domain.tld in any blocklist

Since when can Pi-hole do that?

Good question… for me it was "god given" that Pi-hole blocks also all subdomains when a domain is on the blocklist… 🤷🏼‍♂️

hagezi commented 1 year ago

@thomasmerz it does not work, google.com on a blocklist does not block www.google.com, mail.google.com, ...

tested with: grafik

thomasmerz commented 1 year ago

I'm also really confused…

For /etc/hosts I'm totally with you that only exact matches will do anything 👍🏼

pixel.ad is found in my Pi-hole in some blocklists:

Match found in https://raw.githubusercontent.com/RPiList/specials/master/Blocklisten/child-protection:
  pixel.ad
…
  www.pixel.ad
Match found in https://raw.githubusercontent.com/lightswitch05/hosts/master/docs/lists/ads-and-tracking-extended.txt:
  pixel.ad
…
Match found in https://big.oisd.nl/:
…
  ||pixel.ad^

But my randomly generated subdomain is not found:

[i] No results found for c88553a937ca37301f26773d67c42.pixel.ad within the adlists

But Pi-hole is blocking it "by gravity" as my screenshot above shows.

hagezi commented 1 year ago

@thomasmerz ||pixel.ad^: do you have a developer version running that already supports adblock syntax? Then that is clear ...

thomasmerz commented 1 year ago

Yes, I tested with:

Docker Tag nightly Pi-hole vDev (development, v5.15.5-36-g7ea0bbb) FTL vDev (development, vDev-d35edd4) Web Interface vDev (devel, v5.18.4-32-g94448ed)

(due to some problems with a blocklist-provider changed to ABP-syntax-only (oisd.nl).

My other Pi-holes run with

Docker Tag 2023.02.2 Pi-hole v5.15.5 FTL v5.21 Web Interface v5.18.4

And they behave as you describe and everyone would expect:

2023-03-21 21:56:24 A c88553a937ca37301f26773d67c42.pixel.ad docker-br-4b9270b48dc1_pihole OK (answered by zero.dns0.eu#53)INSECURE NXDOMAIN (176.6ms)

or:

2023-03-21 21:59:59 A c88553a937ca37301f26773d67c42.pixel.ad docker-br-9a66e7ec5aa0_pihole Blocked (external, NULL) IP (51.9ms)
thomasmerz commented 1 year ago

Ok, lessons learned:

lightswitch05 commented 1 year ago

ABP-syntax is much more powerful than simple hosts files

I absolutely agree with that. I'm not sure if I fully follow the outcome here, does pi-hole support ABP or not? I've long considered trying to make ABP formatted versions of my lists, but ultimately never had the motivation since I'm a PiHole user.

lightswitch05 commented 1 year ago

I found the PR, looks like it's only that one specific domain matching rule that is supported, but that's pretty cool!

https://github.com/pi-hole/FTL/pull/1532

I'll have to spend some time thinking about how the new format support could be taken advantage of without conflicting with how the allow list currently works. I still consider this repo to be in maintenance mode as I'm not actively adding new domain, but it still an interesting idea