notracking / hosts-blocklists

Automatically updated, moderated and optimized lists for blocking ads, trackers, malware and other garbage
2.31k stars 146 forks source link

domains.txt too slow with dnsmasq 2.80? #286

Closed joel-bourquard closed 4 years ago

joel-bourquard commented 4 years ago

Hi guys,

Great work! However, I noticed a performance problem (using dnsmasq 2.80 on 1GHz 4-core AMD GX-412TC SOC).

Adding this to my dnsmasq config works well: addn-hosts=/.../hostnames.txt

However, adding this makes DNS response time much worse (eg: 17 ms => 119 ms): conf-file=/.../domains.txt

How about an alternate version of hosts-blocklists where everything would be merged into hostnames.txt? As I understand it, it might miss some non-listed domains, but I'm sure it would use much less CPU.

PS: I've already checked that the CPU and network load on the box is low and for good measure, I boosted the dnsmasq process to nice -19.

Thanks in advance for any help or advice.

joel-bourquard commented 4 years ago

Hi @scafroglia93, no I haven't tried it yet. Have you? In the meantime I switched to https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts -- it's probably inferior, but dnsmasq is fast with it :)

notracking commented 4 years ago

This is a very valid question, thanks!

Of course the size of your blocklist will always have an impact on performance. The notracking list is still very much optimized, but I can imagine that on very low powered devices this will quickly become an issue non-the-less because of size. However, there will be no impact in resolve speed after the first query to a host has been performed, as long as you do proper caching in dnsmasq.

Having said that, blocking more garbage hosts will in the end result in less resources used and bandwidth consumed on all clients that are using your instance of dnsmasq. All those extra blocked hosts from larger lists could have otherwise been queried dozens of times over the day by your clients..

Still I find it interesting if there are significant speed differences these days between services like dnsmasq/dnscrypt-proxy etc.. (you are free to share this kind of information :))

I did run a small test on of my 1core (3GHz) vps instances with 512MB memory, running dnsmasq v2.80 with the notracking dnsmasq domains and hostnames filters in place (I personally use dnscrypt-proxy in my home network).

Script (pinging random subhosts of google.com to bypass caching and dnsmasq's upstream must also be 1.1.1.1 to make a fair compare)

#!/bin/bash

echo "Stats dnsmasq with filters forwarded to 1.1.1.1"

# local DNS
counter=1
while [ $counter -le 25 ]
do
randstring=`strings /dev/urandom | grep -o '[[:alnum:]]' | head -n 10 | tr -d '\n'; echo`
echo -n "${randstring,,}.google.com: "
dig "${randstring,,}.google.com" |grep Query
((counter++))
done

echo
echo "Stats 1.1.1.1"

# 1.1.1.1 resolver
counter=1
while [ $counter -le 25 ]
do
randstring=`strings /dev/urandom | grep -o '[[:alnum:]]' | head -n 10 | tr -d '\n'; echo`
echo -n "${randstring,,}.google.com: "
dig "${randstring,,}.google.com" @1.1.1.1 |grep Query
((counter++))
done
~$ ./test.sh                                                                                                                                                                                        
Stats dnsmasq with filters forwarded to 1.1.1.1                                                                                                                                                                    
uzlxnhromm.google.com: ;; Query time: 72 msec                                                                                                                                                                      
zdsewhzwr6.google.com: ;; Query time: 63 msec                                                                                                                                                                      
6cus5jon6g.google.com: ;; Query time: 61 msec                                                                                                                                                                      
mvl76vcjcs.google.com: ;; Query time: 65 msec
gph0tikrnw.google.com: ;; Query time: 72 msec
9drqspxriv.google.com: ;; Query time: 62 msec
dt2inwnidi.google.com: ;; Query time: 62 msec
zeaog7snqu.google.com: ;; Query time: 62 msec
loxqoy8afb.google.com: ;; Query time: 73 msec
0bgvgza7ct.google.com: ;; Query time: 61 msec
6jeleyh3ii.google.com: ;; Query time: 75 msec
bgz4vtqppy.google.com: ;; Query time: 61 msec
dbutqeczxc.google.com: ;; Query time: 60 msec
4djxeijxeg.google.com: ;; Query time: 70 msec
u0djuewq6j.google.com: ;; Query time: 63 msec
kw9sxiegnn.google.com: ;; Query time: 61 msec
nyw5gyr1y5.google.com: ;; Query time: 59 msec
slos4hu29g.google.com: ;; Query time: 76 msec
2bmefenmde.google.com: ;; Query time: 68 msec
qh9jfevzor.google.com: ;; Query time: 62 msec
xznvw1lwhy.google.com: ;; Query time: 64 msec
s6hrmqts4y.google.com: ;; Query time: 75 msec
desfxf0ttp.google.com: ;; Query time: 75 msec
ricm91du9c.google.com: ;; Query time: 61 msec
a8rwucjrvh.google.com: ;; Query time: 73 msec

Stats 1.1.1.1
s26murnn28.google.com: ;; Query time: 55 msec
cjr2dsxqka.google.com: ;; Query time: 55 msec
pm4gums3ev.google.com: ;; Query time: 54 msec
hvvh7cpjjx.google.com: ;; Query time: 40 msec
k2zu2yzqo4.google.com: ;; Query time: 42 msec
twq3fixsen.google.com: ;; Query time: 38 msec
xwzfsdgdlx.google.com: ;; Query time: 41 msec
pnerhbpael.google.com: ;; Query time: 55 msec
srvisqgr7i.google.com: ;; Query time: 55 msec
4nxrsinjef.google.com: ;; Query time: 52 msec
rph9ngtqyo.google.com: ;; Query time: 43 msec
mgqoevmfm9.google.com: ;; Query time: 41 msec
pnsszbjmdw.google.com: ;; Query time: 41 msec
g6zl75s7p5.google.com: ;; Query time: 55 msec
yzuyugz7vv.google.com: ;; Query time: 41 msec
s2oyn4fv8o.google.com: ;; Query time: 44 msec
imz5ark1pz.google.com: ;; Query time: 55 msec
ljhpxbhcxx.google.com: ;; Query time: 43 msec
ctyciopdgk.google.com: ;; Query time: 55 msec
njaq0aon7o.google.com: ;; Query time: 53 msec
krdm0mixk7.google.com: ;; Query time: 41 msec
zb9bif0pyz.google.com: ;; Query time: 49 msec
xitzms2jou.google.com: ;; Query time: 40 msec
s7bxrg1sj3.google.com: ;; Query time: 42 msec
slacozufxv.google.com: ;; Query time: 41 msec
joel-bourquard commented 4 years ago

Hi @notracking, thanks for your response - this was much appreciated.

Here are the results of your script on my box with several config variants:

Direct to 1.1.1.1 Stats 1.1.1.1 axfyfov6yi.google.com: ;; Query time: 30 msec jgfckpjnlu.google.com: ;; Query time: 30 msec 7oxs5coyht.google.com: ;; Query time: 31 msec fpqrtfjs6s.google.com: ;; Query time: 34 msec pmngqbbeo7.google.com: ;; Query time: 34 msec flfvdkimzd.google.com: ;; Query time: 31 msec iuoju0lxwu.google.com: ;; Query time: 42 msec njamtl9ns1.google.com: ;; Query time: 42 msec cglbsdbfgb.google.com: ;; Query time: 29 msec kwyb96ige5.google.com: ;; Query time: 32 msec k8jcdcz4ms.google.com: ;; Query time: 31 msec ppj097x97d.google.com: ;; Query time: 29 msec qgn98ln7lj.google.com: ;; Query time: 31 msec mwzo1svbu5.google.com: ;; Query time: 32 msec xqkngm584h.google.com: ;; Query time: 33 msec ktaixymeyo.google.com: ;; Query time: 32 msec jczsgicwjx.google.com: ;; Query time: 31 msec tksg9lh3it.google.com: ;; Query time: 32 msec qeqjvjk2yh.google.com: ;; Query time: 35 msec glbdvksvxr.google.com: ;; Query time: 41 msec zvg38ihhzv.google.com: ;; Query time: 32 msec uoac8pfie5.google.com: ;; Query time: 30 msec gjkyrxlj1i.google.com: ;; Query time: 34 msec fgysbkahv6.google.com: ;; Query time: 45 msec 6zxfcsmn3f.google.com: ;; Query time: 30 msec

dnsmasq + hosts-blocklists (HOSTNAMES + DOMAINS): Stats dnsmasq with filters forwarded to 1.1.1.1 hidjqqtemg.google.com: ;; Query time: 124 msec msvxxqyur1.google.com: ;; Query time: 124 msec qcnx6e3scg.google.com: ;; Query time: 114 msec xy2eft6thk.google.com: ;; Query time: 117 msec fp1bxkunue.google.com: ;; Query time: 117 msec y9hc5t1cug.google.com: ;; Query time: 112 msec hoi1sauujw.google.com: ;; Query time: 114 msec 7qoun0adra.google.com: ;; Query time: 114 msec lryb6x9gyo.google.com: ;; Query time: 115 msec fgwnypomkd.google.com: ;; Query time: 117 msec fmnqy6ps1l.google.com: ;; Query time: 115 msec snrdrojbbo.google.com: ;; Query time: 115 msec ffddbxxxrp.google.com: ;; Query time: 116 msec ntaj2vetx1.google.com: ;; Query time: 114 msec jmawilva5w.google.com: ;; Query time: 123 msec thnrvsxzrw.google.com: ;; Query time: 127 msec 0qammtr6nf.google.com: ;; Query time: 127 msec pdlwzh0f8c.google.com: ;; Query time: 114 msec zlk7pritye.google.com: ;; Query time: 126 msec fz0cqbxhxf.google.com: ;; Query time: 125 msec g2lnn3nrjq.google.com: ;; Query time: 126 msec cejycstakz.google.com: ;; Query time: 117 msec brmhkvlncs.google.com: ;; Query time: 128 msec jhxc9iwso1.google.com: ;; Query time: 114 msec 9zpe2yhibt.google.com: ;; Query time: 114 msec

dnsmasq + hosts-blocklists (DOMAINS ONLY): Stats dnsmasq with filters forwarded to 1.1.1.1 7ahgbplgpk.google.com: ;; Query time: 120 msec 3hjknjtcbz.google.com: ;; Query time: 116 msec npiijiqbxc.google.com: ;; Query time: 127 msec ui3mpf6gcx.google.com: ;; Query time: 130 msec fy455bvzjx.google.com: ;; Query time: 121 msec ynmyngaach.google.com: ;; Query time: 119 msec egucepnbaj.google.com: ;; Query time: 116 msec ikyygmqvpz.google.com: ;; Query time: 112 msec 1i6xc5klmg.google.com: ;; Query time: 117 msec jxgp1jtxr0.google.com: ;; Query time: 118 msec wnifixqqwa.google.com: ;; Query time: 118 msec gb9v6g29kk.google.com: ;; Query time: 113 msec 3rtivvxkhs.google.com: ;; Query time: 114 msec tuk2dnh3iu.google.com: ;; Query time: 113 msec l74czpbllm.google.com: ;; Query time: 125 msec tzdbo62red.google.com: ;; Query time: 114 msec fohjb2otwt.google.com: ;; Query time: 127 msec klykbtrasd.google.com: ;; Query time: 127 msec cwjjgfljkp.google.com: ;; Query time: 112 msec ogurgzdyr0.google.com: ;; Query time: 113 msec wp4c1rckl2.google.com: ;; Query time: 126 msec wbrzjtypwt.google.com: ;; Query time: 116 msec uq7ikw9omz.google.com: ;; Query time: 124 msec fw4eog2obj.google.com: ;; Query time: 126 msec ug4iy2rars.google.com: ;; Query time: 130 msec

dnsmasq + hosts-blocklists (hostnames ONLY): Stats dnsmasq with filters forwarded to 1.1.1.1 nksdo3xi3q.google.com: ;; Query time: 43 msec mrxnn9qxh3.google.com: ;; Query time: 29 msec 2yhlxnxzf7.google.com: ;; Query time: 32 msec 2aku9ofs6k.google.com: ;; Query time: 30 msec vl1ry415xu.google.com: ;; Query time: 33 msec vorxx1tnb7.google.com: ;; Query time: 32 msec 8fbkwdoyps.google.com: ;; Query time: 31 msec 1y22q573wu.google.com: ;; Query time: 30 msec 8i7w7vv8ss.google.com: ;; Query time: 33 msec eogneqjr51.google.com: ;; Query time: 43 msec yzr1qlq6s6.google.com: ;; Query time: 29 msec pxvcf6gvcv.google.com: ;; Query time: 31 msec ibtorln4oo.google.com: ;; Query time: 40 msec xm7u7t4die.google.com: ;; Query time: 31 msec yakacttqmu.google.com: ;; Query time: 31 msec mawbsxebdm.google.com: ;; Query time: 30 msec hzwi6muh2b.google.com: ;; Query time: 33 msec vt88gpwu3p.google.com: ;; Query time: 31 msec fkmwly3ofh.google.com: ;; Query time: 31 msec faslvtfr0m.google.com: ;; Query time: 31 msec m5uitjre10.google.com: ;; Query time: 42 msec q37ous3u2h.google.com: ;; Query time: 30 msec 4nbrcyefre.google.com: ;; Query time: 32 msec fc67hwicvx.google.com: ;; Query time: 31 msec pjqxfdnzgo.google.com: ;; Query time: 31 msec

dnsmasq + StevenBlack-hosts: Stats dnsmasq with filters forwarded to 1.1.1.1 sbxral3wdt.google.com: ;; Query time: 28 msec xgwuotnxs6.google.com: ;; Query time: 32 msec 6uwri2w6nk.google.com: ;; Query time: 35 msec ysdnqxmixo.google.com: ;; Query time: 27 msec h9punq0v8z.google.com: ;; Query time: 27 msec jdvkav7gnz.google.com: ;; Query time: 34 msec yccfa6smiy.google.com: ;; Query time: 42 msec ym5th0rhwf.google.com: ;; Query time: 31 msec vcsqi8ztcs.google.com: ;; Query time: 29 msec trz5kptcvq.google.com: ;; Query time: 44 msec rom6mvfof6.google.com: ;; Query time: 31 msec ym75aj9i1f.google.com: ;; Query time: 29 msec gnph90zfyp.google.com: ;; Query time: 33 msec el3k0o49lx.google.com: ;; Query time: 30 msec s7kkbf6b9d.google.com: ;; Query time: 43 msec untxjpizf4.google.com: ;; Query time: 34 msec inctexp6yh.google.com: ;; Query time: 40 msec mfebz4dakm.google.com: ;; Query time: 30 msec nu4lr4rt18.google.com: ;; Query time: 31 msec nlvrqk0aja.google.com: ;; Query time: 31 msec ssygfpfbcc.google.com: ;; Query time: 31 msec ysqrwkauw4.google.com: ;; Query time: 34 msec p3ycrhcjie.google.com: ;; Query time: 43 msec mfwwdzfwvp.google.com: ;; Query time: 43 msec xdcar0li0v.google.com: ;; Query time: 44 msec

As you can see, the difference is very dramatic on my 1GHz AMD Geode. Please note that even on your 3GHz node, there seems to be a 20-25% performance hit, and that's pure CPU usage...

I think the reason behind this, is that the "hosts" lookup is near-instant in dnsmasq: it's most likely a hashtable lookup, which takes O(log N) time, where N is the number of entries.

On the other hand, it looks like the "domains" part is dozens of time more expensive, ie: dnsmasq probably iterates on all entries, like eveluating thousands of regexps in sequence. Takes O(N) time. This would explains the 100 milliseconds which (for a CPU) is an eternity...

Oh and regarding your comment about caching: on my side the blacklist hits (eg: ad.doubleclick.net) do not seem to be cached by dnsmasq, so they always take the same (long) time.

So, I really think that your great list would deserve a "light CPU load" version of it, which would be of course longer, but would only contain hosts. That alternate version would be larger and less clever of course, but it could be faster - even on fast CPUs. What do you think?

PS: to scafroglia93: your advice was appreciated, but I really need something that runs on-premise.

leucos commented 4 years ago

FWIW, I tested here with dnsmasq using this script:


counter=1

local=$(mktemp local_XXXX)
quad1=$(mktemp quad1_XXXX)

while [ $counter -le 25 ]; do
    randstring=`strings /dev/urandom | grep -o '[[:alnum:]]' | head -n 10 | tr -d '\n'; echo`

    # Local DNS
    randstring=`strings /dev/urandom | grep -o '[[:alnum:]]' | head -n 10 | tr -d '\n'; echo`
    echo $(dig "${randstring,,}.google.com" | grep Query | awk '{ print $4 }') >> $local

    # 1.1.1.1 resolver
    randstring=`strings /dev/urandom | grep -o '[[:alnum:]]' | head -n 10 | tr -d '\n'; echo`
    echo $(dig "${randstring,,}.google.com" @1.1.1.1 | grep Query | awk '{ print $4 }') >> $quad1

    ((counter++))

done

echo "Comparing ${local} to ${quad1}"

ministat -sw74 "${local}" "${quad1}"

rm "${local}" "${quad1}"

Gave the results:

x local_lL75
+ quad1_sH6U
+--------------------------------------------------------------------------+
|         +x                                                               |
|         +*                                                               |
|         +*                                                               |
|         **+                                                              |
|         ***                                                              |
|         ***  +  x                                                        |
|         ***  *  *xxx*   +   *                                       xx+x+|
| |_________M________A___________________|                                 |
||_________M______A_________________|                                      |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  25            40           140            44         58.44     30.944951
+  25            40           142            43         53.76     27.391118
No difference proven at 95.0% confidence

No noticeable difference althought the CPU is a quad Intel(R) Core(TM) i3-5005U CPU @ 2.00GHz

notracking commented 4 years ago

Thanks for the reports, really helpful! I'm considering reconstructuring the dnsmasq versions of the lists, so it will have the same behavior as the dnscrypt-proxy one (threat everything as a domain filter). This would mean that the hostnames.txt will become obsolete. I will also reconsider how to deal with the ipv6 addresses, potentially halving the size of the list. I do expect that to have a big performance impact by itself.

Still working on some other important stuff on the background for the maintenance of the lists (has a bit more prio). This will be addressed at some point in time..!

Homas commented 4 years ago

Guys, instead of dnsmasq you can use ISC Bind or PowerDNS. You need to convert the feed to RPZ format. To do that you can create own scripts or use my ioc2rpz DNS server (open source http://ioc2rpz.com) or use RPZ feed which is available for free on the ioc2rpz community web-site (https://ioc2rpz.net)

@ home I'm using this and some other feeds on Raspberry Pi Zero W as a secondary DNS. It can easily handle up to 500k-900k RPZ rules. As a primary DNS and a management web-interface I'm using Raspberry Pi 4/4Gb - it should be able to handle >2m RPZ rules w/o performance impact. Here is a demo video.

`` Stats Pi Zero, Pi 4 vs 1.1.1.1 Pi 4/4Gb neUOfZ6PtlypbjXA.google.com: ;; Query time: 39 msec WibDB5Xz85cceiW7.google.com: ;; Query time: 36 msec GJpQjSvaqa4Pz1Bn.google.com: ;; Query time: 54 msec w62OMUQViCKBnHQj.google.com: ;; Query time: 35 msec EjsKTbuUrkgWLavd.google.com: ;; Query time: 44 msec lp65GSdyg88g40hL.google.com: ;; Query time: 64 msec O4cE92UQ4OfXktZf.google.com: ;; Query time: 40 msec 0ydOrGAmxs3JpOpr.google.com: ;; Query time: 37 msec yQASlxOWyCZpztOd.google.com: ;; Query time: 42 msec USQeV1tZdsXmMCYp.google.com: ;; Query time: 36 msec U7XWlfpUUWcWpCqT.google.com: ;; Query time: 40 msec pO8PcEBIfbwFurn5.google.com: ;; Query time: 43 msec xEFOFzK7n7BoCLYs.google.com: ;; Query time: 45 msec 9yyDntUmV37XexBo.google.com: ;; Query time: 38 msec vNS5HYBMWqDpJN7e.google.com: ;; Query time: 42 msec 69EUj35IaMqU5Ss0.google.com: ;; Query time: 45 msec 4746G9pgx23oZTJn.google.com: ;; Query time: 39 msec V9aY8dpLOqaaTjzy.google.com: ;; Query time: 44 msec BZm0XeRQ6S7jHmD2.google.com: ;; Query time: 50 msec JFg8Uaaeq3rQ45dq.google.com: ;; Query time: 42 msec jluDK6ARvSbiWk7L.google.com: ;; Query time: 49 msec yJqsGDcv78xJpQnb.google.com: ;; Query time: 37 msec SY5KkdClaytxlThK.google.com: ;; Query time: 70 msec X9mXpFUitGyK8IVF.google.com: ;; Query time: 38 msec IylasMfnrsreRUSE.google.com: ;; Query time: 47 msec

Pi Zero W LPR3maYAL2pkrTzz.google.com: ;; Query time: 50 msec 3ThNmI2joxcYXOpi.google.com: ;; Query time: 42 msec zFnYhZ8mpZbLcppq.google.com: ;; Query time: 38 msec IdNX58XlEcF7ZfpX.google.com: ;; Query time: 67 msec hoUctVS2BsCvFLyc.google.com: ;; Query time: 43 msec Ag3mXMrazEuzhZTz.google.com: ;; Query time: 43 msec BSrbvsJPuZjMDQAJ.google.com: ;; Query time: 49 msec 7WJ6ZjgkBYQRxHGA.google.com: ;; Query time: 49 msec 7evjuWVVe3K2dsX4.google.com: ;; Query time: 56 msec xQ7bQzjCLUuijvhb.google.com: ;; Query time: 82 msec RrTg2qNaigcjvi96.google.com: ;; Query time: 38 msec wKUzUG99CF4cTO8V.google.com: ;; Query time: 38 msec Xc2TsmnF9oUFm86M.google.com: ;; Query time: 44 msec fDwYRmUblMmcKEtW.google.com: ;; Query time: 78 msec ktGiJGdGlUaemSxH.google.com: ;; Query time: 68 msec EWMjgRJNhW6Y5iQ3.google.com: ;; Query time: 56 msec R82HCxwe8g5HuDLu.google.com: ;; Query time: 53 msec RIQhVfibNAIthJnS.google.com: ;; Query time: 49 msec b545euWuJpQsyCfa.google.com: ;; Query time: 87 msec RvMSggJ7XQk3DIhb.google.com: ;; Query time: 43 msec HsDiqPYRwzutiLFY.google.com: ;; Query time: 54 msec RgFCPKpfcIJwh3r3.google.com: ;; Query time: 72 msec Neymztumcwf1OrgK.google.com: ;; Query time: 46 msec o4cpQr3s0HzZYHEt.google.com: ;; Query time: 55 msec 69BhTydWk4LZ2V98.google.com: ;; Query time: 45 msec

Stats 1.1.1.1 ZNh0VbLf1iRInRRj.google.com: ;; Query time: 47 msec UJC6gLbGFffxitDE.google.com: ;; Query time: 52 msec adI15qFtrHhlHVdO.google.com: ;; Query time: 40 msec h2lQYc8ZjY5Kevin.google.com: ;; Query time: 47 msec bR8xHfFPJFANpG54.google.com: ;; Query time: 43 msec JN4Hm16cBoZdTZxX.google.com: ;; Query time: 44 msec Abt0Iglrgc0wmZEg.google.com: ;; Query time: 40 msec uBBHk3dpMdPohVz6.google.com: ;; Query time: 42 msec 35KVTd99vnONHELe.google.com: ;; Query time: 39 msec EZRCvJJMRzs1ePHM.google.com: ;; Query time: 46 msec NX10J39cDcrXWKZ1.google.com: ;; Query time: 45 msec pvdC4SHaiIOhyswB.google.com: ;; Query time: 58 msec TqdMj7ZvnVwjhALE.google.com: ;; Query time: 37 msec vYXJtHd7kuUQ3V5F.google.com: ;; Query time: 39 msec 415l9yVHoNMEBqZ3.google.com: ;; Query time: 51 msec jVdE0NYGBquAmeEk.google.com: ;; Query time: 46 msec fDfo2p0S01TeNvBy.google.com: ;; Query time: 44 msec YfzX3rY803XOCoYH.google.com: ;; Query time: 55 msec h1LQymZs1fFpLbKg.google.com: ;; Query time: 41 msec b9mSuoOFnIrQKQ5O.google.com: ;; Query time: 40 msec MbDGBvuvUzCuyEEq.google.com: ;; Query time: 39 msec eEqoqBkwr24pUfsP.google.com: ;; Query time: 36 msec m9alQRnaS2p1KQne.google.com: ;; Query time: 37 msec wvNsxR0P8wmD3ufP.google.com: ;; Query time: 43 msec uqgYVeXkU04Y83OL.google.com: ;; Query time: 37 msec ``

Homas commented 4 years ago

Thanks for the reports, really helpful! I'm considering reconstructuring the dnsmasq versions of the lists, so it will have the same behavior as the dnscrypt-proxy one (threat everything as a domain filter). This would mean that the hostnames.txt will become obsolete.

I would not deprecate the hostnames. The domain it self can be good but a few hostnames/subdomains can be used fro tracking. E.g. how would you handle cloudfront.net?

I will also reconsider how to deal with the ipv6 addresses, potentially halving the size of the list. I do expect that to have a big performance impact by itself.

It would be better to have separate lists for IP based indicators (IPv4 and IPv6).

BTW today I've noticed that the feed size was increased by 30% but can not find any change log. Do you know what was added?

notracking commented 4 years ago

I would not deprecate the hostnames. The domain it self can be good but a few hostnames/subdomains can be used fro tracking. E.g. how would you handle cloudfront.net?

The only difference when merging everything into a single domain filter list, would be that instead of blocking only uniqueevilhost@cloudfront.net a domain only variant would block *.uniqueevilhost@cloudfront.net and uniqueevilhost@cloudfront.net. This behavior is already default for a lot of adblockers (including dnscrypt-proxy) and even though it's not purely correct usage, practically there will be no difference (I have yet to see any issue edge cases).

It would be better to have separate lists for IP based indicators (IPv4 and IPv6).

I like this solution, because it will be effective and it's very simple. Though I would really like to re-evalutate the "new" v2.80 dnsmasq address syntax with the # command as discussed here. I will need to do some more testing...!

BTW today I've noticed that the feed size was increased by 30% but can not find any change log. Do you know what was added?

That was because of this. The number will reduce again after some days when the deaddomain / hostnames wait times have passed (~40%, based on sample testing)

notracking commented 4 years ago

@leucos

No noticeable difference althought the CPU is a quad Intel(R) Core(TM) i3-5005U CPU @ 2.00GHz

What version are you using? And would you mind sharing your config?

notracking commented 4 years ago

@joel-bourquard

So, I really think that your great list would deserve a "light CPU load" version of it, which would be of course longer, but would only contain hosts. That alternate version would be larger and less clever of course, but it could be faster - even on fast CPUs. What do you think?

That would be a list with compromises on coverage and quite significantly.. I do want to search for an optimal case for use with dnsmasq, but the concept of this list will not change. Guessing from the other comments I might end up advising to use a different solution to do the filtering, since there seem to be very resource efficient implementations of this compared to dnsmasq.

leucos commented 4 years ago

sure @notracking I use v2.79

domain-needed
bogus-priv
filterwin2k

localise-queries
log-queries

local=/zone/
domain=zone
expand-hosts
no-negcache
resolv-file=/etc/resolv.conf

address=/gw.zone/192.168.0.254

conf-file=/etc/dnsmasq_domains.txt
addn-hosts=/etc/dnsmasq_hostnames.txt

dhcp-authoritative
dhcp-leasefile=/tmp/dhcp.leases

interface=enp0s0
listen-address=192.168.0.254

# DHCP

## DHCP range
dhcp-range=192.168.0.100,192.168.0.240,12h

## Netmask
dhcp-option=1,255.255.255.0

## Route
dhcp-option=3,192.168.0.254

## excluded Iface
no-dhcp-interface=enp3s0

dhcp-host=a:b:c:d:e:f,192.168.0.224,somehost
notracking commented 4 years ago

@joel-bourquard I notice something during testing.., it actually takes a while to load all of the domains.txt filters in to dnsmasq after starting up, especially on slower devices. During loading i was receiving very high deviations (while the filters where already still working as expected). Try waiting a good few minutes before doing any testing..

I'm working out some of my results, thanks to the improved test script of @leucos :)

notracking commented 4 years ago

And some more test results..

x=filtered +=direct to 1.1.1.1

Normal dnsmasq notracking blocklists

  | N | Min | Max | Median | Avg | Stddev
x | 50 | 27 | 92 | 31 | 42.8 | 21.403795
+ | 50 | 40 | 72 | 46 | 52.32 | 9.9497513

  | N | Min | Max | Median | Avg | Stddev
x | 50 | 27 | 135 | 42 | 52.28 | 28.452639
+ | 50 | 40 | 75 | 46 | 50.74 | 10.275412

  | N | Min | Max | Median | Avg | Stddev
x | 50 | 27 | 103 | 31 | 40.98 | 19.043977
+ | 50 | 39 | 73 | 46 | 50.96 | 10.380437

Without ipv6 hosts (both lists)

  | N | Min | Max | Median | Avg | Stddev
x | 50 | 23 | 99 | 27 | 39.86 | 20.896255
+ | 50 | 40 | 89 | 46 | 53.7 | 12.53607

  | N | Min | Max | Median | Avg | Stddev
x | 50 | 23 | 99 | 27 | 38.2 | 20.120048
+ | 50 | 38 | 72 | 45 | 48.64 | 8.4557914

  | N | Min | Max | Median | Avg | Stddev
x | 50 | 23 | 119 | 28 | 37.46 | 19.714503
+ | 50 | 40 | 73 | 46 | 52.92 | 10.758651

Without ipv6 filters (both lists), hostnames merged in to domain list and all filters using address=/asdf.com/# format (wait a good few minutes before launching a test!!)

  | N | Min | Max | Median | Avg | Stddev
x | 50 | 26 | 120 | 42 | 52.74 | 27.291929
+ | 50 | 40 | 73 | 46 | 50.86 | 9.7248052

  | N | Min | Max | Median | Avg | Stddev
x | 50 | 26 | 89 | 30 | 42.66 | 20.517399
+ | 50 | 39 | 71 | 45 | 47.9 | 8.3793965

  | N | Min | Max | Median | Avg | Stddev
x | 50 | 26 | 239 | 30 | 46.3 | 35.362987
+ | 50 | 40 | 73 | 45 | 47.22 | 7.2429839
notracking commented 4 years ago

I've released a new list for dnsmasq 2.80+, using the # redirector for returning NXDOMAIN, also replacing the 'dupe' ipv4 and ipv6 hosts. This cuts the size of the list in half.

Please try that out! (and still wait a couple of minutes before doing any speedtests on very slow devices)

https://raw.githubusercontent.com/notracking/hosts-blocklists/master/dnsmasq/dnsmasq.blacklist.txt

notracking commented 4 years ago

I will close this one up.

It is known that any blocklist will consume extra cpu cycles, some more than others. Stripping down our list is not possible without gambling that you still include the most important items.

Now that we also support the new dnsmasq 2.80 syntax, there is not much left to be done. Further improvements should be made within dnsmasq itself, though practically even with a 100ms added delay you will simply not notice the difference, as long as you enable caching in your dns services.

Underneath the line you will have faster and way more secure connectivity, because of all the blocked junk.