yubiuser / pihole_adlist_tool

A tool to analyse how your pihole adlists cover you browsing behavior
MIT License
536 stars 32 forks source link

ABP style adlists #77

Open dowden20 opened 1 year ago

dowden20 commented 1 year ago

Report currently does not show the number for ABP style adlists. Please consider include calculation for ABP style adlists.

Thank you

  [i]  Adlist coverage

id   enabled  total_domains  domains_covered  hits_covered  unique_domains_covered  address
---  -------  -------------  ---------------  ------------  ----------------------  ---------------------------------------------------------------------------------
413  1        0                                                                     https://big.oisd.nl/
469  1        0                                                                     https://raw.githubusercontent.com/hagezi/dns-blocklists/main/adblock/multi.txt
470  1        0                                                                     https://raw.githubusercontent.com/hagezi/dns-blocklists/main/adblock/tif.txt
yubiuser commented 1 year ago

You discovered one issue (total_domains being empty): https://github.com/pi-hole/FTL/issues/1573

The rest is a feature request and I'm not sure if there is a feasible way to solve it: ABP style domains are handled as a special kind of "RegEX" within FTL and I'm not sure if there is good way to handle them within bash. And even if I find a way to treat them as bash RegEx it will be painfully slow on lists like https://big.oisd.nl/ to check every domain queries against all adlist entries. (This is the reason why RegEx checking is not enabled by default).

tkil commented 1 year ago

Hi! I would also find it very helpful to use ABP-style lists with Pi-hole. The list I'm looking at is effectively just a list of domains, but in ABP format: https://v.firebog.net/hosts/Admiral.txt

Examples:

||2znp09oa.com^
||2znp09oa.com^
||35.186.219.42^
||35.186.249.84^
||35.190.48.184^
||35.190.58.50^
||35.190.62.199^

I was able to get the domains recognized by adding:

    -e 's/^\|\|(.*)\^$/\1/' \

To this sed command (currently around line 658 in /opt/pihole/gravity.sh):

  # 2) Remove carriage returns
  # 3) Remove lines starting with ! (ABP Comments)
  # 4) Remove lines starting with [ (ABP Header)
  # 5) Remove lines containing ABP extended CSS selectors ("##", "#!#", "#@#", "#?#") preceded by a letter
  # 6) Remove comments (text starting with "#", include possible spaces before the hash sign)
  # 7) Remove leading tabs, spaces, etc. (Also removes leading IP addresses)
  # 8) Convert from ABP format: ||some.domain.here^ --> some.domain.here
  # 9) Remove empty lines

    sed -i -r \
    -e 's/\r$//' \
    -e 's/\s*!.*//g' \
    -e 's/\s*\[.*//g' \
    -e '/[a-z]\#[$?@]{0,1}\#/d' \
    -e 's/\s*#.*//g' \
    -e 's/^.*\s+//g' \
    -e 's/^\|\|(.*)\^$/\1/' \
    -e '/^$/d' "${destination}"

(Note the above snippet has the new expression and a matching comment.)

This doesn't solve the full problem of handling fancy ABP patterns, but it might be worth adding to take advantage of the many hosts- / domains-only lists out there.

Happy to open a PR for this, but honestly, it took me long enough to even find your Github org, and I still haven't figured out exactly where gravity.sh lives in your various repos...

Current versions:

Thanks again!

yubiuser commented 1 year ago

@tkil

I'm not sure what you try to archieve with this RegEx and if this should improve gravit.sh within Pi-hole or my adlist tool. You can find gravity.sh here: https://github.com/pi-hole/pi-hole/blob/master/gravity.sh

tkil commented 1 year ago

@yubiuser Ah, I might have misfired -- Sorry for the noise. I'll make this suggestion over in the pi-hole repo.

Thanks for the redirect!