yubiuser / pihole_adlist_tool

A tool to analyse how your pihole adlists cover you browsing behavior
MIT License
540 stars 32 forks source link
pi-hole

Maintenance mode

This repo is in maintenance mode: It's unlikely that I will add new features myself but are willing to accept external pull requests. There are several reasons for this decisions:

I'll added a MIT licence so anyone who would like to continue to improve or use the tool should be able to do so. If I would start fresh, I'll probably would use another language (maybe C, maybe GO) which would allow to implement/reuse FTL's regex engine allowing to parse the regex domains and the ABP-style entries.

Pihole Adlist Tool

This script tries to provide you with a bunch of information that enables you to decide which adlists you need based on your browsing behavior. It does that by matching your browsing history (FTL's querylog) with your current adlist configuration (gravity database) generating a list of domains that you have visited in the past and which would have been blocked if your current adlist configuration would have been in place back then. In a second step the scripts takes this list and attributes each domain to the adlists it is on (similar to what pihole -q does). The final output is a table of all your adlists with the corresponding number of covered domains (domains that you have visited and that would have been blocked if only this particular adlist would have been used).


The script outputs

As domains usually appear on more then one adlist I introduce the concept of unique covered domains. Those are domains that have been visited, would have been blocked and appear on just one adlist. This might help you to value your adlists not just by how many domains are covered but also what would happen if you disable this adlist.


Limits


Caveat


Requirements


Installation

Download the tool, either via git clone or link. Make the script executable with chmod +x ./pihole_adlist_tool and run it with ./pihole_adlist_tool

Options

pihole_adlist_tool [options]

Options:
  -d [Num]                        Consider the last [Num] days (Default: 30). Enter 0 for all-time analysis.

  -t [Num]                        Show top blocked domains. [Num] defines the number to show.

  -s [total/covered/hits/unique]  Set sorting order to total (total domains) covered (domains covered), hits (hits covered) or unique (covered unique domains) DESC. (Default sorting: id ASC).

  -u                              Show covered unique domains.

  -a                              Run in 'automatic mode'. No user input is required at all, assuming default choice would be to leave everything untouched.

  -r                              Analyse RegEx as well. Depending on the amount of domains and RegEx this might take a while. Please note: Can only be used, if Pi-hole is NOT running in a Docker Container!

  -v                              Display pihole_adlist_tool's version.

  -h                              Show this help dialog.

Background

As adlist configuration might have changed over time (add/removed adlists, enabled/disabled adlists) this script doesn't rely on Pi-holes blocking status for the analysis but rather determine if queries from the long-term database had been blocked with the current adlist configuration. Relying on the blocking status could lead to wrong assumptions about the coverage of adlist with your current adlist configuration: some domains might have been blocked in the past but wouldn't be blocked now (removed adlist) and some might be blocked now but haven't in the past (added adlist). If the adlist configuration hasn't changed over time, there should be no huge difference between this approach and using Pi-hole's blocking status.

The deeper reason for re-analyzing the queries is that this tool should help you to make predictions for the future: assuming your online behavior is rather stable over time and you analyse a long enough dataset from the past, this tool will tell you which adlist might be worth keeping (because it contains a lot of covered domains) and which you could safely remove (no covered domains and/or covered domains but no unique covered domains).


Support, Contribute & Todo

I'm not a developer. This script is mostly done by copy-pasting snippets I found online. I know there is no proper error and exception handling. If you are willing to improve the script feel free to submit pull requests. Things on my todo list: