yubiuser / pihole_adlist_tool

A tool to analyse how your pihole adlists cover you browsing behavior
MIT License
542 stars 32 forks source link

πŸ› fix issue 34 when running Pi-Hole in Docker Container #40

Closed thomasmerz closed 2 years ago

thomasmerz commented 2 years ago

This PR will fix issue #34 by disabling regex-check when running Pi-hole in Docker Container 😁 To be honest to the user, we will print this info in options and when running πŸ‘πŸ»

thomasmerz commented 2 years ago

Checks won't fail any more after PR #37 has beeen finished/done…

yubiuser commented 2 years ago

but it doesn't say command not found anymore - problem/issue solved wink

I'm not convinced if this is really a working solution. If it takes hours this is nothing I'd like to ship to users

yubiuser commented 2 years ago

Does it really work?

I did not test it, but expect it to fail also here: https://github.com/yubiuser/pihole_adlist_tool/blob/890f454c781c32d984c73f0bd8494aa2d9be2770/pihole_adlist_tool#L1006

thomasmerz commented 2 years ago

Does it really work?

I did not test it, but expect it to fail also here:

https://github.com/yubiuser/pihole_adlist_tool/blob/890f454c781c32d984c73f0bd8494aa2d9be2770/pihole_adlist_tool#L1006

I reviewed and a fix will come…

I'm not convinced if this is really a working solution. If it takes hours this is nothing I'd like to ship to users

You are right, do not merge this until we have an optimized version! I have some ideas but have to test and check…

⚠️ Review and diff should be done when PR #37 has been merged because this PR depends on it! ⚠️

thomasmerz commented 2 years ago

πŸ’‘ This is working with limit of 2000 (distinct) domains, but limited due to "Argument list too long". This array should be splitted by for example 500 domains and regex-tested in packages…

         mapfile -t all_domains < <(sqlite $TEMP_DB "SELECT domain FROM all_domains")

-        for CURRENT_DOMAIN in "${all_domains[@]}"; do
                 if [ "$PIHOLE_DOCKER" = 0 ];
                     then
-                        pihole-FTL regex-test $CURRENT_DOMAIN |grep -E -o "DB ID [0-9]*"|awk '{print $3}' | while read REGEX_ID; do
-                            sqlite $TEMP_DB "INSERT INTO domain_by_regex(domain, regex_id) VALUES ('$CURRENT_DOMAIN',$REGEX_ID);"
+                       for CURRENT_DOMAIN in "${all_domains[@]}"; do
+                            pihole-FTL regex-test "$CURRENT_DOMAIN" |grep -E -o "DB ID [0-9]*"|awk '{print $3}' | while read -r REGEX_ID; do
+                                sqlite $TEMP_DB "INSERT INTO domain_by_regex(domain, regex_id) VALUES ('$CURRENT_DOMAIN',$REGEX_ID);"
+                            done
                         done
                     else
-                        docker exec $CONTAINER_ID pihole-FTL regex-test $CURRENT_DOMAIN |grep -E -o "DB ID [0-9]*"|awk '{print $3}' | while read REGEX_ID; do
+                        ### TODO: remove limit!!!! ####### TODO ####### TODO #######
+                        docker exec "$CONTAINER_ID" bash -c \
+                          "$(for CURRENT_DOMAINS in $(sqlite $TEMP_DB 'SELECT domain FROM all_domains limit 2000'); do
+                              echo pihole-FTL regex-test "$CURRENT_DOMAINS" ; done)" \
+                                  |grep -E -o "DB ID [0-9]*"|awk '{print $3}' | while read -r REGEX_ID; do
                             sqlite $TEMP_DB "INSERT INTO domain_by_regex(domain, regex_id) VALUES ('$CURRENT_DOMAIN',$REGEX_ID);"
                         done
+                        #docker exec "$CONTAINER_ID" pihole-FTL regex-test "$CURRENT_DOMAIN" |grep -E -o "DB ID [0-9]*"|awk '{print $3}' | while read -r REGEX_ID; do
+                        #    sqlite $TEMP_DB "INSERT INTO domain_by_regex(domain, regex_id) VALUES ('$CURRENT_DOMAIN',$REGEX_ID);"
+                        #done
+                        ### TODO: this really needs some performance optimization to avoid 1000s of docker-execs
+                        :
                 fi
-        done

 # count for each RegEx_id how many domains are in domain_by_regex and store it in table regex_black

@@ -1014,9 +1023,17 @@ EOF
         mapfile -t all_exact_domains < <(sqlite $GRAVITY "SELECT domain FROM domainlist WHERE type in (0,1);")

         for CURRENT_DOMAIN in "${all_exact_domains[@]}"; do
-          pihole-FTL regex-test "$CURRENT_DOMAIN" |grep -E -o "DB ID [0-9]*"|awk '{print $3}' | while read -r REGEX_ID; do
-              sqlite $TEMP_DB "INSERT INTO domainlist_regex(domain, regex_id) VALUES ('$CURRENT_DOMAIN',$REGEX_ID);"
-            done
+                if [ "$PIHOLE_DOCKER" = 0 ];
+                    then
+                        pihole-FTL regex-test "$CURRENT_DOMAIN" |grep -E -o "DB ID [0-9]*"|awk '{print $3}' | while read -r REGEX_ID; do
+                            sqlite $TEMP_DB "INSERT INTO domainlist_regex(domain, regex_id) VALUES ('$CURRENT_DOMAIN',$REGEX_ID);"
+                        done
+                    else
+                        docker exec "$CONTAINER_ID" pihole-FTL regex-test "$CURRENT_DOMAIN" |grep -E -o "DB ID [0-9]*"|awk '{print $3}' | while read -r REGEX_ID; do
+                            sqlite $TEMP_DB "INSERT INTO domainlist_regex(domain, regex_id) VALUES ('$CURRENT_DOMAIN',$REGEX_ID);"
+                        done
+                        ### TODO: may need some performance optimization to avoid so much docker-execs
+                fi

❓ Which counts are relevant for the regex-mode?

  [i]  Since 01.02.2022  0:58:13 522 different domains from your adlists have been blocked 15680 times in total
       (blocked directly by gravity or during deep CNAME inspection)
  [i]  Using you current adlist configuration 546 domains would have been blocked 16606 times
  [i]  Adlist coverage (sum of "hits covered" ca. 47.000)
  [i]  In total your adlists contain 189 visited (covered) unique domains - meaning those domains are contained only in a single adlist.
…
  [i]  Since 01.02.2022  0:58:13 you have been visiting 2980 different domains.
       You have 22 blacklist RegEx configured (20 enabled)
       With your enabled blacklist RegEx you would have covered 1 different domains.
  [i]  Please note: the internal Pi-hole RegEx test used here only checks domains against enabled RegEx.
       Therefor, currently disabled RegEx will always have 0 domains covered.
yubiuser commented 2 years ago

Which counts are relevant for the regex-mode?

All regex (number not shown) are checked against the 2980 distinct domains

thomasmerz commented 2 years ago

@yubiuser , I changed my commit / PR:

This PR will disable regex-analysis when Pi-hole is running in a docker container and tells user about this.

So Regex analysis is still not supported on docker, but script isn't doing anything worthless/useless stuff anymore. Do you agree?

thomasmerz commented 2 years ago

I also added a .stickler.yml πŸ˜‰

thomasmerz commented 2 years ago

The difficulty is that we have an unknown and possibly very big list of domains that we have to pass to the docker container for regex-checking. I really have no clue how to solve this efficiently… πŸ€·πŸ»β€β™‚οΈ

yubiuser commented 2 years ago

I agree that we should probably disable regex-checking if Pi-hole is running on docker as long as we do not have a better solution.

Please remove the stickler stuff from this PR. I'm happy to include it, but don't mix unrelated changes in one PR

stickler-ci[bot] commented 2 years ago

I couldn't find a .stickler.yml file in this repository. I can make one for you, or you can create one by following the documentation.

thomasmerz commented 2 years ago

Is this PR now ready for a merge? Or do you want a single commit rebased to commit 890f454c ?

yubiuser commented 2 years ago

I guess best would be a rebase on current development. Please also target development. Do not modify .github/workflows/shellcheck.yml in this PR, draft a new one for this. One PR for one issue.

thomasmerz commented 2 years ago
yubiuser commented 2 years ago

Whats's about the changes in line 859 and above?

thomasmerz commented 2 years ago

What changes do you mean? πŸ€·πŸ»β€β™‚οΈ git diff development shows me these changes:

grafik

Is there anything "wrong" or are you missing anything? If this is the case, it might be my fault while rebasing and resolving some merge conflicts…

yubiuser commented 2 years ago

Something went wrong during rebase I guess. These changes were introduced in your other PR but later reverted.

thomasmerz commented 2 years ago

I fixed the rebase now. git diff now shows only the "do not RegEx when running in docker container" parts for issue #34 - do you want another squash of all commits to a single one before merging?

thomasmerz commented 2 years ago

I'll add a final test later (in/after the lunch-break possibly)…

yubiuser commented 2 years ago

I just decided that it looks Ok ;-) But will wait for your feedback before merging to master

thomasmerz commented 2 years ago

Short coffee-break before next meeting 😜

🦎πŸ–₯  βœ” ~/temp/PRs/pihole_adlist_tool [fix_issue_34|…2]
11:26 $ ./pihole_adlist_tool -d 2 -t 100 -r -a > new_with_r.txt
🦎πŸ–₯  βœ” ~/temp/PRs/pihole_adlist_tool [fix_issue_34|…3]
11:30 $ ./pihole_adlist_tool -d 2 -t 100 -a > new_without_r.txt
sdiff -s new_with_r.txt new_without_r.txt🦎πŸ–₯  βœ” ~/temp/PRs/pihole_adlist_tool [fix_issue_34|…4]
11:30 $ sdiff -s new_with_r.txt new_without_r.txt
  [i]  REGEX_MODE: Disabled due to running Pi-hole in Docker  |   [i]  REGEX_MODE: Disabled
  [i]  Since 26.02.2022 11:30:20 982 different domains |      [i]  Since 26.02.2022 11:30:45 982 different domains
  [i]  Using you current adlist configuration 998 domains|    [i]  Using you current adlist configuration 998 domains
                                                                                                                                Those would have been the 100 top blocked adlist do |          Those would have been the 100 top blocked adlist do
stats.gc.fe.apple-dns.net                                2225 | stats.gc.fe.apple-dns.net                                2223
stats.gc-apple.com.akadns.net                            727  | stats.gc-apple.com.akadns.net                            725
63  1        407420         697              38098         11 | 63  1        407420         697              38094         11
64  1        99381          28               1533          16 | 64  1        99381          28               1531          16
🦎πŸ–₯  βœ” ~/temp/PRs/pihole_adlist_tool [fix_issue_34|…4]
11:31 $