negan07 / ancistrus

Netgear's D7000 Nighthawk Router Experience Distributed Project
https://negan07.github.io/ancistrus/
GNU General Public License v2.0
66 stars 17 forks source link

dnrd 'fails' regularly #43

Closed jonwaland closed 5 years ago

jonwaland commented 5 years ago

Not your problem, but a core issue with D7000 - the dns autoforwarder craps out after a large number of look ups - currently about a week here in my house.

So I want to setup a restart process say every night - do I just do a personal cronjob "rc_dnrd autostart" ? Is there anything that gets kicked off by default on the hour (say 3am)? Do you know if there are any other flags to the rc_dnrd comand I need?

negan07 commented 5 years ago

rc_dnrd autostart doesn't exist

usage is: rc_dnrd [start|stop|restart|brs_hijack]

(brs_hijack should stand for "browsers hijack" , used by some internal ng services like parental control..)

remember also that crontab has a different PATH var, not including /usr/sbin/rc_app/ so include the absolute path of the file to run e.g. 0 3 * * * root /usr/sbin/rc_app/rc dnrd restart

note that: rc_dnrd -> foreground, output readable rc dnrd -> forked background, no output readable

Anyway this is not a solution just a workaround. dnrd should not crash down in this way: better to investigate where's the problem with dnrd and eventually patch/upgrade it or replacing with something similar (dnsmasq ?)

any suggestion about another dns cache retainer/forwarder ?

how do/did u aware about failures ?

jonwaland commented 5 years ago

how do/did u aware about failures ? "dad - the internet is broken" - checked a bunch of devices and yes, it appeared to be dead. reboots router.

couple weeks later, it happens again. This time I did some investigation - ping to ip address works, ping to a locally cached address works. ping to a new address fails.... nslookup fails - but I have the router directing DNS look ups to an internal dnsmasq instance - I'm still using the DHCP on the router though as this is just easier.

SO investigating the DNS look ups - dns via the router is dead. DNS direct via the dnsmasq server is fine. Don't know if there are any log messages on the router, but restarting dnrd fixes the issue.

(and yes - I meant restart not autostart - was badly multitasking)

Reading above, I'm assuming that the router initiated instance doesn't log - not knowing enough about the filesystem on this box, any suggestion on how and where to kick it off and have it log - then we can at least see if it reports any errors.

negan07 commented 5 years ago

upgrade busybox

opkg update && opkg install busybox then logout/login and run ps | grep dnrd

copy the entire command you see the cmd may be different depending on configurations run rc dnrd stop then run it in foreground pasting the same cmd adding -d 9 at the end look at the output msgs and paste them here

to log in background, killall -9 dnrd then replace -d 9 above with -l but not sure if it will log something somewhere without code modification

jonwaland commented 5 years ago

debug level 9 will kill filespace I suspect!!

admin@D7000:~$ /var/dnrd_1 -a 192.168.1.254 -m hosts -c off -r 0 -s 192.168.1.128 -d 9
Notice: caching turned off
Debug: initialising master DNS database
Debug: no blacklist: /etc/dnrd/blacklist
Warning: Using /etc/hosts will be removed in a future version. Please use only the /etc/dnrd/master file or use -m off.
Debug: initialising from /etc/hosts, domain= <none>
Debug: /etc/hosts: 7 records
Debug: added authority for 0.0.127.in-addr.arpa
Debug: added authority for 1.168.192.in-addr.arpa
Debug: 11 records in master DNS database
Debug: Received DNS query for "local"

- -- query
000 - B8 32 01 00 00 01 00 00 00 00 00 00 05 6C 6F 63  .2...........loc
010 - 61 6C 00 00 06 00 01                             al.....

id= 47154, q= 0, opc= 0, aa= 0, wr/ra= 1/0, trunc= 0, rcode= 0 [0100]
qd= 1
  name= local., type= 6, class= 1
ans= 0
ns= 0
ar= 0

Debug: Forwarding the query to DNS server 192.168.1.128
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 1, try_count: 0, sock 3, msg1dce8 ,len: 23
Debug: OK, let's wait for the response
Debug: Open sockets: 1, active: 1, count: 0, timeouts: 0
Debug: srv=192.168.1.128, myqid=37268, client_qid=12984
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 2, try_count: 1, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 1, try_count: 2, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 2, try_count: 3, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 0, try_count: 4, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: query_timeout: try_count: 5
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 0, try_count: 5, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: Received DNS query for "local"

- -- query
000 - F7 48 01 00 00 01 00 00 00 00 00 00 05 6C 6F 63  .H...........loc
010 - 61 6C 00 00 06 00 01                             al.....

id= 63304, q= 0, opc= 0, aa= 0, wr/ra= 1/0, trunc= 0, rcode= 0 [0100]
qd= 1
  name= local., type= 6, class= 1
ans= 0
ns= 0
ar= 0

Debug: Forwarding the query to DNS server 192.168.1.128
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 1, try_count: 0, sock 3, msg1dce8 ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 2, try_count: 1, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 1, try_count: 2, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 2, try_count: 3, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 0, try_count: 4, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: query_timeout: try_count: 5
Debug: sending to: srv=192.168.1.128, client_time: 1544749597, ttl: 0, try_count: 5, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: Received DNS query for "gllto.glpals.com"

- -- query
000 - AF DB 01 00 00 01 00 00 00 00 00 00 05 67 6C 6C  .............gll
010 - 74 6F 06 67 6C 70 61 6C 73 03 63 6F 6D 00 00 01  to.glpals.com...
020 - 00 01                                            ..

id= 45019, q= 0, opc= 0, aa= 0, wr/ra= 1/0, trunc= 0, rcode= 0 [0100]
qd= 1
  name= gllto.glpals.com., type= 1, class= 1
ans= 0
ns= 0
ar= 0

Debug: Forwarding the query to DNS server 192.168.1.128
Debug: sending to: srv=192.168.1.128, client_time: 1544749598, ttl: 1, try_count: 0, sock 3, msg1dce8 ,len: 34
Debug: OK, let's wait for the response
Debug: handling socket 3
Debug: Received DNS reply for "gllto.glpals.com"

- -- reply
000 - 77 16 81 80 00 01 00 05 00 00 00 00 05 67 6C 6C  w............gll
010 - 74 6F 06 67 6C 70 61 6C 73 03 63 6F 6D 00 00 01  to.glpals.com...
020 - 00 01 C0 0C 00 05 00 01 00 00 00 2C 00 19 09 61  ...........,...a
030 - 7A 72 6C 74 6F 76 7A 73 09 61 7A 75 72 65 65 64  zrltovzs.azureed
040 - 67 65 03 6E 65 74 00 C0 2E 00 05 00 01 00 00 06  ge.net..........
050 - 13 00 0F 09 61 7A 72 6C 74 6F 76 7A 73 02 65 63  ....azrltovzs.ec
060 - C0 38 C0 53 00 05 00 01 00 00 0A 0F 00 19 05 73  .8.S...........s
070 - 63 64 6E 32 03 77 70 63 05 34 64 66 35 39 06 63  cdn2.wpc.4df59.c
080 - 68 69 63 64 6E C0 42 C0 6E 00 05 00 01 00 00 0A  hicdn.B.n.......
090 - 0F 00 0C 05 73 61 34 67 6C 03 77 70 63 C0 7E C0  ....sa4gl.wpc.~.
0A0 - 93 00 01 00 01 00 00 0A 3D 00 04 98 C3 23 C7     ........=....#.

id= 30486, q= 1, opc= 16, aa= 0, wr/ra= 1/1, trunc= 0, rcode= 0 [8180]
qd= 1
  name= gllto.glpals.com., type= 1, class= 1
ans= 5
  name= gllto.glpals.com., type= 5, class= 1, ttl= 44
  name= azrltovzs.azureedge.net., type= 5, class= 1, ttl= 1555
  name= azrltovzs.ec.azureedge.net., type= 5, class= 1, ttl= 2575
  name= scdn2.wpc.4df59.chicdn.net., type= 5, class= 1, ttl= 2575
  name= sa4gl.wpc.chicdn.net., type= 1, class= 1, ttl= 2621
ns= 0
ar= 0

Debug: Forwarding the reply to the host 192.168.1.6
Debug: Received DNS query for "local"

- -- query
000 - F7 81 01 00 00 01 00 00 00 00 00 00 05 6C 6F 63  .............loc
010 - 61 6C 00 00 06 00 01                             al.....

id= 63361, q= 0, opc= 0, aa= 0, wr/ra= 1/0, trunc= 0, rcode= 0 [0100]
qd= 1
  name= local., type= 6, class= 1
ans= 0
ns= 0
ar= 0

Debug: Forwarding the query to DNS server 192.168.1.128
Debug: sending to: srv=192.168.1.128, client_time: 1544749599, ttl: 1, try_count: 0, sock 3, msg1dce8 ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749599, ttl: 2, try_count: 1, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749599, ttl: 1, try_count: 2, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749599, ttl: 2, try_count: 3, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: sending to: srv=192.168.1.128, client_time: 1544749599, ttl: 0, try_count: 4, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: query_timeout: try_count: 5
Debug: sending to: srv=192.168.1.128, client_time: 1544749599, ttl: 0, try_count: 5, sock 3, msg106f65c ,len: 23
Debug: OK, let's wait for the response
Debug: handling socket 3
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Debug: Received DNS query for "lastpass.com"

- -- query
000 - C8 02 01 00 00 01 00 00 00 00 00 00 08 6C 61 73  .............las
010 - 74 70 61 73 73 03 63 6F 6D 00 00 01 00 01        tpass.com.....

id= 51202, q= 0, opc= 0, aa= 0, wr/ra= 1/0, trunc= 0, rcode= 0 [0100]
qd= 1
  name= lastpass.com., type= 1, class= 1
ans= 0
ns= 0
ar= 0

Debug: Forwarding the query to DNS server 192.168.1.128
Debug: sending to: srv=192.168.1.128, client_time: 1544749603, ttl: 1, try_count: 0, sock 3, msg1dce8 ,len: 30
Debug: OK, let's wait for the response
Debug: handling socket 3
Debug: Received DNS reply for "lastpass.com"

- -- reply
000 - BC 47 81 80 00 01 00 01 00 00 00 00 08 6C 61 73  .G...........las
010 - 74 70 61 73 73 03 63 6F 6D 00 00 01 00 01 C0 0C  tpass.com.......
020 - 00 01 00 01 00 00 00 07 00 04 68 7A 17 B6        ..........hz..

id= 48199, q= 1, opc= 16, aa= 0, wr/ra= 1/1, trunc= 0, rcode= 0 [8180]
qd= 1
  name= lastpass.com., type= 1, class= 1
ans= 1
  name= lastpass.com., type= 1, class= 1, ttl= 7
ns= 0
ar= 0

debug =1 isn't great either

Notice: caching turned off
Debug: initialising master DNS database
Debug: no blacklist: /etc/dnrd/blacklist
Warning: Using /etc/hosts will be removed in a future version. Please use only the /etc/dnrd/master file or use -m off.
Debug: initialising from /etc/hosts, domain= <none>
Debug: /etc/hosts: 7 records
Debug: added authority for 0.0.127.in-addr.arpa
Debug: added authority for 1.168.192.in-addr.arpa
Debug: 11 records in master DNS database
Debug: Open sockets: 1, active: 1, count: 0, timeouts: 0
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed
Warning: RCODE was set. Ignoring reply from 192.168.1.128
Debug: check_reply failed

I will check why I'm getting requests for 'local' though

negan07 commented 5 years ago

are you forwarding dns to dnsmasq at 192.168.1.128 ? which dns servers did you setup ? manual or assigned from wan dhcp ? does the local dns server log something strange on dnsmasq ?

jonwaland commented 5 years ago

yes - I have an intel nuc at 192.168.1.128 which I have dnsmaq running a local cache, plus doing requests from google (8.8.8.8) plus my ISP as a backup.

but I'm using DHCP to allocate IP addresses around the network from the router (192.168.1.254), and if you use the router DHCP it offers itself as the only DNS server. I could move to using the NUC as DHCP, but it was easier to leave as it (40+ devices) (there is no way in the Router GUI to force it to offer DHCP but with 192.168.1.128 as the default DNS server - this would be ideal. I should look at teh DHCP server options on the CLI I guess).

no errors show in the dnsmasq log. When dnrd fails, dns look ups to the router timeout - they never get to the NUC.

negan07 commented 5 years ago

dhcp is heavily customized and should be managed manually to forward something on internal lan group instead of wan side

but if I remember, dnsmasq acts also as dhcp server: if you don't need to use parental control genie app or xcloud or similar you can try to disable router one and use dnsmasq dhcp directly

no errors show in the dnsmasq log. When dnrd fails, dns look ups to the router timeout - they never get to the NUC. it should be useful to find out when/where/why dnrd hags bailing to try to fix it

anyway newer versions of the tools are avilable: looking to patch it to give a solution

jonwaland commented 5 years ago

yes - dnsmasq does dhcp - I've just not had the will/time/effort to switch over - the D7000 is increasingly becoming an expensive adsl modem :-)

negan07 commented 5 years ago

I have compiled a package with the updated version of dnrd called 2.21beta (2.20.4)

there are many code modifications I had to apply over because 2.19 from ng is very customized and the new version includes some new stuff and options: some of them caused some issues I had to signal to the author.

If you want to try it, install the related package

to debug some stuffs, add -f -d [1-5] like this: dnrd -a 192.168.1.254 -m /etc/hosts -c off -r 0 -s 192.168.1.128 -u admin -f -d 5

jonwaland commented 5 years ago
admin@D7000:/tmp$ curl -k -O https://github.com/negan07/ancistrus/files/2688147/
dnrd.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   559    0   559    0     0    535      0 --:--:--  0:00:01 --:--:--   535
admin@D7000:/tmp$ unzip dnrd.zip
Archive:  dnrd.zip
unzip: invalid zip magic 6D74683C
admin@D7000:/tmp$ cat dnrd.zip
<html><body>You are being <a href="https://github-production-repository-file-5c1aeb.s3.amazonaws.com/77776987/2688147?X-Amz-Algorithm=AWS4-HMAC-SHA256&amp;X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20181218%2Fus-east-1%2Fs3%2Faws4_request&amp;X-Amz-Date=20181218T051551Z&amp;X-Amz-Expires=300&amp;X-Amz-Signature=578da8464d4c9174ca418bc4921fbb736a96b399b5a7fd8bb89fe98716453b96&amp;X-Amz-SignedHeaders=host&amp;actor_id=0&amp;response-content-disposition=attachment%3Bfilename%3Ddnrd.zip&amp;response-content-type=application%2Fzip">redirected</a>.</body></html>admin@D7000:/tmp$

managed to get file onto a local server then curl it.

admin@D7000:/tmp$ dnrd -a 192.168.1.254 -m /etc/hosts -c off -r 0 -s 192.168.1.1
28 -u admin -f -d 5
ERROR: tcpsock: Couldn't bind local address
Debug: Shutting down...

hmm. retrying.

OK - got it running

So - it works. But if I want to get some useful debug, there is going to be a massive amount of data to trawl through

negan07 commented 5 years ago

yes the direct link maybe unavailable on issue threads so dl it locally then upload onto the router with scp or wget or curl

I would suggest you to try it for a while in daemon mode with: dnrd -a 192.168.1.254 -m /etc/hosts -c off -r 0 -s 192.168.1.128 -u admin and see if this version simply solves the bailing issue

otherwise if you want both to test & investigate on it you could try to launch it with a -f -d 3 debug level

jonwaland commented 5 years ago

Its running now - has been for 18 hours thus far.

I'll leave it running and see if there is it fails - usually takes 1-2 weeks

negan07 commented 5 years ago

hoping this solves the issue

in the meanwhile will patch the program removing -u mandatory params & full path /etc/hosts to full match rc_dnrd then trying to send to the log events (the option -S does it but to write on gui messages it needs a tag) then the log may be sent to ip syslog server without bailing

jonwaland commented 5 years ago

interesting point - I should set up syslog on another box.

negan07 commented 5 years ago

I made some modifications I've added the syslog tags debug so if calling dnrd daemon manually with debug -d 3 option it should write messages on the register router syslog on web gui without writing on foreground console

better not using more than -d 3 option to avoid flood hoping this is enough

then syslog can be sent though lan or a syslog local server with the register page option flags

hoping this could help identifying the bailing causes

jonwaland commented 5 years ago

well, router just hard crashed.

have just rebooted, and had to reinstall ancistrus-core, iproute2, and qos-sqm (although having installed qos-sqm, it correctly got the settings from nvram, which is good). I'm not running the monolithic image - is this correct? Also no longer seeing the package manager in the GUI?

For now Ive reverted to the firmware dnrd - wife acceptance factor etc.

I need to get a remote syslog running elsewhere before I can restart this

negan07 commented 5 years ago

did dnrd run in debug mode ?

jonwaland commented 5 years ago

when it crashed it was running the beta dnrd. But with no logging, as I dint' have a window open to it.

negan07 commented 5 years ago

I don't know why sercomm insists with this dns forwarder: all the embedded projects have ignored it because there are many other like dnsmasq

I think that combination of your proper config through another dns forwarder has probably generated those bailings: the 2 versions are not so different and all the new features included are not in use

I suggest you to report as issue the recurrent bailing problem you encountered in the dnrd issue section: https://github.com/benjaminpetrin/dnrd/issues

I suggest you to include all the log output you have posted here all all the useful information you have to help the author to find the problem.

In the meanwhile my idea is to test dnrd with a similar situation redirecting dns queries from dhcp clients to the router dnrd and then logging it through its event register

negan07 commented 5 years ago

actually still running on

dnsmasq.zip

I have prepared also a dnsmasq 2.80 including openwrt/lede patches It doesn't includes ng specifics and dns hacks but can be tried as an alternative

note that dnrd & dnsmasq cannot be run at the same time because both have to open 53 port socket

jonwaland commented 5 years ago

cool - alas, will all have to wait a week or two for now.

negan07 commented 5 years ago

any news about it ?

jonwaland commented 5 years ago

OK - back on line.

I've just set up a syslog server on my box that does everything (including dnsmasq) - so next step is to run either dnrd with debug logging enabled, or dnsmasq?

negan07 commented 5 years ago

one or another, not both, they can't socket the same port

I'd try dnsmasq because dnrd easily can give the same old results (consider that dnsmasq doesn't include the hacks made by sercomm related to opendns, update url redir, netgear specs on recursion, and more but all there are all the fixes present in the openwrt package)

In second instance dnrd can be tried out

jonwaland commented 5 years ago

sorry - off work so busy with "home stuff" - plus the DSL modem part of the D7000 has crapped out - now using an old billion 7800N as a DSL modem bridged into the D7000 as router/wifi.

still plan to get the newer dnrd running against my syslog server

jonwaland commented 5 years ago

just run dnrd version 2.21_beta1

dnrd -a 192.168.1.254 -m /etc/hosts -c off -r 0 -s 192.168.1.128 -u admin -f -d 3
Notice: caching turned off
Debug: initialising master DNS database
Debug: no blacklist: blacklist
Debug: /etc/hosts: 7 records
Debug: added authority for 0.0.127.in-addr.arpa
Debug: added authority for 1.168.192.in-addr.arpa
Debug: 11 records in master DNS database
getpwnam: No such file or directory
ERROR: Could not become "admin" user. Please create the user account or specify a valid user with  the -u option.
Debug: Shutting down...

restarted without the -u bit, and with -d 3

Running as -f

Notice: caching turned off
Debug: initialising master DNS database
Debug: no blacklist: blacklist
Debug: /etc/hosts: 7 records
Debug: added authority for 0.0.127.in-addr.arpa
Debug: added authority for 1.168.192.in-addr.arpa
Debug: 11 records in master DNS database
Debug: setting uid to 0
Debug: Received DNS query for "spclient.wg.spotify.com"
Debug: Forwarding the query to DNS server 192.168.1.128
Debug: sending to: srv=192.168.1.128, client_time: 1548911030, ttl: 1, try_count: 0, sock 3, msg1dc0c ,len: 41
Debug: OK, let's wait for the response
Debug: Received DNS query for "graph.facebook.com"
Debug: Forwarding the query to DNS server 192.168.1.128
Debug: sending to: srv=192.168.1.128, client_time: 1548911030, ttl: 1, try_count: 0, sock 6, msg1dc0c ,len: 36
Debug: OK, let's wait for the response
Debug: handling socket 3

```running without -f, from my syslog:

Jan 31 16:07:53 192.168.1.254 [DNRD] Forwarding the query to DNS server %s Jan 31 16:07:53 192.168.1.254 [DNRD] sending to: srv=%s, client_time: %lu, ttl: %d, try_count: %d, sock %d, msg%x ,l Jan 31 16:07:53 192.168.1.254 [DNRD] OK, let's wait for the response Jan 31 16:07:54 192.168.1.254 [DNRD] handling socket %i Jan 31 16:07:54 192.168.1.254 [DNRD] Received DNS reply for "%s" Jan 31 16:07:54 192.168.1.254 [DNRD] Forwarding the reply to the host %s Jan 31 16:07:54 192.168.1.254 [DNRD] Received DNS query for "%s" Jan 31 16:07:54 192.168.1.254 [DNRD] Forwarding the query to DNS server %s Jan 31 16:07:54 192.168.1.254 [DNRD] sending to: srv=%s, client_time: %lu, ttl: %d, try_count: %d, sock %d, msg%x ,l Jan 31 16:07:54 192.168.1.254 [DNRD] OK, let's wait for the response Jan 31 16:07:54 192.168.1.254 [DNRD] handling socket %i Jan 31 16:07:54 192.168.1.254 [DNRD] Received DNS reply for "%s" Jan 31 16:07:54 192.168.1.254 [DNRD] Forwarding the reply to the host %s Jan 31 16:08:01 192.168.1.254 [DNRD] Received DNS query for "%s" Jan 31 16:08:01 192.168.1.254 [DNRD] Forwarding the query to DNS server %s Jan 31 16:08:01 192.168.1.254 [DNRD] sending to: srv=%s, client_time: %lu, ttl: %d, try_count: %d, sock %d, msg%x ,l Jan 31 16:08:01 192.168.1.254 [DNRD] OK, let's wait for the response



regardless I'll leave it running and see if it dies.
negan07 commented 5 years ago

with the package version you can let it run directly transparently with rc_dnrd restart without any cmd line modifications needed apart the debug -d 3 option as you did in the last try above

you should be able to read the dbg msgs on the log now the package version is the 2.20.4 (same as 2.21b)

let's see if there are any issues

negan07 commented 5 years ago

dnsblast.zip I've found an easy dns forwarder stress test to do to accelerate and enhance the dnrd/dnsmasq stability test: https://github.com/jedisct1/dnsblast

running from a linux machine host, with cmd like for ex:

dnsblast dns_forwarder_ip 50000 100

it sends 50000 queries with 100 queries/sec rate

contemporary, on the router console, with:

top

you can see dynamically the memory usage & cpu load (dnrd will run at top usage easily)

other examples are:

` To send a shitload of queries to 127.0.0.1:

dnsblast 127.0.0.1

To send 50,000 queries to 127.0.0.1:

dnsblast 127.0.0.1 50000

To send 50,000 queries at a rate of 100 queries per second:

dnsblast 127.0.0.1 50000 100

To send 50,000 queries at a rate of 100 qps to a non standard-port, like 5353:

dnsblast 127.0.0.1 50000 100 5353

To send malformed packets, prepend "fuzz":

dnsblast fuzz 127.0.0.1
dnsblast fuzz 127.0.0.1 50000
dnsblast fuzz 127.0.0.1 50000 100
dnsblast fuzz 127.0.0.1 50000 100 5353

` testing dnrd 2.20.4 with the cmd above with 50000 q at 100 q/sec the ratio is ~58% and cpu load is ~3%

with dnsblast routerip load goes to about 50% but ratio is less than 0.5%

with fuzz (malformed) queries all the queries are unresolved matching the expectations

dnrd it seems to remain stable, at least for a brief usage, cpu doesn't goes on saturation load remaining globally under 50% (75% considering the brcm internal buffer)

jonwaland commented 5 years ago

installed the package dnrd just to be sure:

admin@D7000:~$ opkg install dnrd
Installing dnrd (2.20.4) on root.
Downloading https://raw.githubusercontent.com/negan07/ancistrus/gh-pages/ancistrus-arm-D7000/dnrd_2.20.4_armD7000.ipk.
Configuring dnrd.

restarted with -d 3

syslog is still faulty:

Feb  4 10:52:31 192.168.1.254 [DNRD] Received DNS query for "%s"
Feb  4 10:52:31 192.168.1.254 [DNRD] Forwarding the query to DNS server %s
Feb  4 10:52:31 192.168.1.254 [DNRD] sending to: srv=%s, client_time: %lu, ttl: %d, try_count: %d, sock %d, msg%x ,l
Feb  4 10:52:31 192.168.1.254 [DNRD] OK, let's wait for the response
Feb  4 10:52:31 192.168.1.254 [DNRD] handling socket %i
Feb  4 10:52:31 192.168.1.254 [DNRD] Received DNS reply for "%s"
Feb  4 10:52:31 192.168.1.254 [DNRD] Forwarding the reply to the host %s

will try see if I can hurt it though...

jonwaland commented 5 years ago

lol

walaj@JonLabNUC:~/dnsblast/dnsblast-master$ ./dnsblast  192.168.1.254 50000 100
^Cnt: [5995] - Received: [2932] - Reply rate: [48 pps] - Ratio: [48.91%]
Feb  4 10:59:17 192.168.1.254 [DNRD] Socket limit reached. Dropping new queries
Feb  4 10:59:17 192.168.1.254 [DNRD] Received DNS query for "%s"
Feb  4 10:59:17 192.168.1.254 [DNRD] Forwarding the query to DNS server %s
Feb  4 10:59:17 192.168.1.254 [DNRD] Socket limit reached. Dropping new queries
Feb  4 10:59:17 192.168.1.254 [DNRD] Received DNS query for "%s"
Feb  4 10:59:17 192.168.1.254 [DNRD] Forwarding the query to DNS server %s
negan07 commented 5 years ago

the behavior seems reliable for now

negan07 commented 5 years ago

anything new ?

jonwaland commented 5 years ago

nothing new - not helped by various power outages, and my syslog server dying.

give me another couple weeks. or close - I've got enough to work with I think.

negan07 commented 5 years ago

all the time you need

does the server still crash sometimes ?

jonwaland commented 5 years ago

I've had the router lock up - solid lights etc. hard reset required. Not sure if it dnrd though

negan07 commented 5 years ago

at this point inhibit dnrd service and try with dnsmasq

https://github.com/negan07/ancistrus/files/2706339/dnsmasq.zip

jonwaland commented 5 years ago

ok - given up for now - has been up 15 days without a n issue, on your dnrd

I don't have time right now to migrate between my server dnsmasq and the embeded one.

Happy if you want to close this .