openwrt / packages

Community maintained packages for OpenWrt. Documentation for submitting pull requests is in CONTRIBUTING.md
GNU General Public License v2.0
3.94k stars 3.45k forks source link

bind-server: dumps core #8537

Open bodop opened 5 years ago

bodop commented 5 years ago

Maintainer: @\nmeyerhans Environment: lantiq, xway, arcadyan,arv752dpw22, OpenWrt SNAPSHOT, r9742-2892033 (upgraded after having the same problem with 18.06.02)

Description:

named dumps cores every few hours. I don't know how to inspect the dump. May I upload it here (Do I have to update rndc-keys after they get published via the dump)? Do you need other informations?

named -v => "BIND 9.12.3-P4 "

Btw. after a few days the device runs out of memory. I fixed this by

sysctl -w "kernel.core_pattern=/tmp/%e.core"

How would you disable all dumps?

cotequeiroz commented 5 years ago

ping @nmeyerhans

nmeyerhans commented 5 years ago

named dumps cores every few hours. I don't know how to inspect the dump. May I upload it here (Do I have to update rndc-keys after they get published via the dump)? Do you need other informations?

The core file probably wouldn't be useful to me without the binaries and associated shared libraries (which I assume you built yourself). It may be easiest for you to try to generate a stack trace yourself. You'll need to build gdb for your device, at which you should be able to use it to load the named binary and core file and generate a useful stack trace. If you can do that, please post the stack trace here.

Btw. after a few days the device runs out of memory. I fixed this by

sysctl -w "kernel.core_pattern=/tmp/%e.core"

How would you disable all dumps?

You can disable coredumps systemwide by setting kernel.core_pattern to an empty string and setting kernel.core_uses_pid=0

bodop commented 5 years ago

It was an "official" snapshot binary installed with the image builder. I thought, it would be possible to read such a dump, since it would not make sense to dump cores, if they are not readable.

"Unfortunately" no core was dumped during the last four days. I will go on by compiling myself.

bodop commented 5 years ago

OK, now I have a backtrace that corresponds to openwrt-18.06.2 (bind-9.11.2-P1)

(gdb) bt
#0  0x772f42b4 in NODENAME (node=node@entry=0x76e9ffd0, name=name@entry=0x109b55ab) at rbt.c:273
#1  0x772f9494 in dns_rbt_findnode (rbt=0x76ea3010, name=0x76d224f0, foundname=0x76fabeb0, node=0x7ff118b8, chain=0x7ff11c80, options=1, callback=0x77300308 <cache_zonecut_callback>, callback_arg=0x7ff11c70) at rbt.c:1609
#2  0x77306c14 in cache_find (db=<optimized out>, name=0x76d224f0, version=<optimized out>, type=<optimized out>, options=8, now=<optimized out>, nodep=0x7ff1234c, foundname=0x76fabeb0, rdataset=0x76d27250, sigrdataset=0x76d15958)
    at rbtdb.c:5045
#3  0x0043d64c in query_find (client=<optimized out>, event=<optimized out>, qtype=1) at query.c:7028
#4  0x004451d0 in ns_query_start (client=0xb1bdb0) at query.c:9290
#5  0x00425418 in client_request (task=<optimized out>, event=<optimized out>) at client.c:2802
#6  0x771c9fc0 in dispatch (manager=0x76fa4010) at task.c:1140
#7  isc__taskmgr_dispatch (manager0=<optimized out>) at task.c:1653
#8  0x771ceb1c in evloop (ctx=0x77202f88 <isc_g_appctx>) at app.c:508
#9  0x771cf014 in isc__app_ctxrun (ctx0=0x77202f88 <isc_g_appctx>) at app.c:624
#10 0x771d0030 in isc_app_run () at ../app_api.c:198
#11 0x0041e230 in main (argc=<optimized out>, argv=<optimized out>) at ./main.c:1390
(gdb) print node->namelen
$6 = 4
(gdb) print name->length
Cannot access memory at address 0x109b55b3
bodop commented 5 years ago

Looks like an optimization problem. No core was dumped after using "-O0" instead of "-Os". I would suspect, that something around rbt.c:1608 goes wrong. In fact the disassembled code looks totatlly different in both variants. But that is far beyond my skills.

dengqf6 commented 5 years ago

Please check if bind 9.14 fixes your issue. #8641

diizzyy commented 5 years ago

There are a few packages that occationally breaks using -Os on MIPS

bodop commented 5 years ago

Please check if bind 9.14 fixes your issue. #8641

I observed no core dump for three days. bind-9.14.1 seems not to be affected by the "-Os" problem.

named -V

BIND 9.14.1 (Stable Release) <id:d4c1008>
running on Linux mips 4.9.152 #0 Mon Jan 28 08:54:32 2019
built by make with '--target=mips-openwrt-linux' '--host=mips-openwrt-linux' '--build=x86_64-pc-linux-      gnu' '--program-prefix=' '--program-suffix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--libexecdir=/usr/lib' '--sysconfdir=/etc' '--datadir=/usr/share' '--localstatedir=/var' '--mandir=/usr/man' '--infodir=/usr/info' '--disable-nls' '--enable-shared' '--enable-static' '--with-randomdev=/dev/urandom' '--disable-threads' '--disable-linux-caps' '--with-openssl=/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr' '--with-libtool' '--without-lmdb' '--enable-epoll=yes' '--with-gost=no' '--with-gssapi=no' '--with-ecdsa=yes' '--with-readline=no' '--sysconfdir=/etc/bind' '--enable-filter-aaaa' '--with-libjson=no' '--with-libxml2=no' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=mips-openwrt-linux' 'target_alias=mips-openwrt-linux' 'CC=mips-openwrt-linux-musl-gcc' 'CFLAGS=-Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -g3 -iremap/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/build_dir/target-mips_24kc_musl/bind-9.14.1:bind-9.14.1 -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro ' 'LDFLAGS=-L/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/lib -L/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/lib -L/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/usr/lib -L/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/lib -znow -zrelro ' 'CPPFLAGS=-I/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/include -I/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/include -I/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/usr/include -I/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/include/fortify -I/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/include ' 'PKG_CONFIG=/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/host/bin/pkg-config' 'PKG_CONFIG_PATH=/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/lib/pkgconfig:/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/share/pkgconfig' 'PKG_CONFIG_LIBDIR=/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/lib/pkgconfig:/home/bodop/temp/openwrt-sdk-18.06.2-lantiq-xway_gcc-7.3.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/share/pkgconfig'
compiled by GCC 7.3.0
compiled with OpenSSL version: OpenSSL 1.0.2q  20 Nov 2018
linked to OpenSSL version: OpenSSL 1.0.2q  20 Nov 2018
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled

default paths:
  named configuration:  /etc/bind/named.conf
  rndc configuration:   /etc/bind/rndc.conf
  DNSSEC root key:      /etc/bind/bind.keys
  nsupdate session key: /var/run/named/session.key
  named PID file:       /var/run/named/named.pid
  named lock file:      /var/run/named/named.lock
diizzyy commented 5 years ago

Thanks for following up on this, perhaps a workaround is to force -O2 although size will probably increase a bit.l

dengqf6 commented 5 years ago

The PR is merged. Is your issue fixed?

bodop commented 5 years ago

Installed "BIND 9.14.2" yesterday (via opkg install on "Linux LEDE 4.14.121 #0 Sun Jun 2 09:08:38 2019 mips GNU/Linux") and got a core dump "isc-worker0000.1559716915.2143.11.core" this morning. I will try again with a debugging version. But a different problem is that the dump is truncated:

BFD: warning: /tmp/isc-worker0000.1559716915.2143.11.core is truncated: expected core file size >= 24989696, found: 11259904

Is this an indication for low memory? Currently the output of "free" is

          total        used        free      shared  buff/cache   available

Mem: 58680 32188 19828 800 6664 10872 Swap: 0 0 0

bodop commented 5 years ago

That is today's core dump:

Core was generated by `/usr/sbin/named -u bind -f -c /etc/bind/named.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x77d9ac24 in NODENAME (node=node@entry=0x774646d0, name=name@entry=0x109b55ab) at rbt.c:277
277 rbt.c: No such file or directory.
[Current thread is 1 (LWP 5575)]
(gdb) bt
#0  0x77d9ac24 in NODENAME (node=node@entry=0x774646d0, name=name@entry=0x109b55ab) at rbt.c:277
#1  0x77d9fda4 in dns_rbt_findnode (rbt=0x7743e010, name=0x771396d0, foundname=0x0, node=0x779ca698, chain=0x779c9e90, 
    options=5, callback=0x0, callback_arg=0x0) at rbt.c:1599
#2  0x77da8110 in findnodeintree (rbtdb=0x7743b010, tree=0x7743e010, name=0x771396d0, create=<optimized out>, 
    nodep=0x779ca7b0) at rbtdb.c:2728
#3  0x77da8454 in findnode (db=<optimized out>, name=<optimized out>, create=<optimized out>, nodep=<optimized out>)
    at rbtdb.c:2789
#4  0x77dfec40 in cache_name (now=1559758488, addrinfo=0x773bf9e8, name=0x771396d0, fctx=0x7712c010) at resolver.c:5988
#5  cache_message (now=1559758488, addrinfo=0x773bf9e8, fctx=0x7712c010) at resolver.c:6408
#6  resquery_response (task=<optimized out>, event=<optimized out>) at resolver.c:7646
#7  0x77c64890 in dispatch (threadid=0, manager=0x779d0010) at task.c:1130
#8  run (queuep=<optimized out>) at task.c:1297
#9  0x77f96a6c in start (p=0x779cad44) at src/thread/pthread_create.c:195
#10 0x77f2b80c in __clone () at src/thread/mips/clone.s:33
Backtrace stopped: frame did not save the PC

Look very similar to the first one. I wonder why bind-9.14.1 did not crash. I will go back to "-O0" again.

bodop commented 5 years ago

Everything is fine again with -O0. named -V outputs:

BIND 9.14.2 (Stable Release) <id:7a62b30>
running on Linux mips 4.14.121 #0 Sun Jun 2 09:08:38 2019
built by make with '--target=mips-openwrt-linux' '--host=mips-openwrt-linux' '--build=x86_64-pc-linux-gnu' '--program-prefix=' '--program-suffix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--libexecdir=/usr/lib' '--sysconfdir=/etc' '--datadir=/usr/share' '--localstatedir=/var' '--mandir=/usr/man' '--infodir=/usr/info' '--disable-nls' '--disable-linux-caps' '--with-openssl=/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr' '--with-libtool' '--without-lmdb' '--enable-epoll' '--without-gssapi' '--without-readline' '--without-python' '--sysconfdir=/etc/bind' '--without-libjson' '--without-libxml2' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=mips-openwrt-linux' 'target_alias=mips-openwrt-linux' 'CC=mips-openwrt-linux-musl-gcc' 'CFLAGS=-Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -g3 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/build_dir/target-mips_24kc_musl/bind-9.14.2:bind-9.14.2 -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O0 ' 'LDFLAGS=-L/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/lib -L/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/lib -L/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.4.0_musl/usr/lib -L/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.4.0_musl/lib -znow -zrelro -Wl,--gc-sections,--as-needed ' 'CPPFLAGS=-I/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/include -I/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/include -I/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.4.0_musl/usr/include -I/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.4.0_musl/include/fortify -I/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.4.0_musl/include ' 'PKG_CONFIG=/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/host/bin/pkg-config' 'PKG_CONFIG_PATH=/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/lib/pkgconfig:/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/share/pkgconfig' 'PKG_CONFIG_LIBDIR=/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/lib/pkgconfig:/home/bodop/temp/openwrt-sdk-lantiq-xway_gcc-7.4.0_musl.Linux-x86_64/staging_dir/target-mips_24kc_musl/usr/share/pkgconfig'
compiled by GCC 7.4.0
compiled with OpenSSL version: OpenSSL 1.1.1c  28 May 2019
linked to OpenSSL version: OpenSSL 1.1.1c  28 May 2019
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
threads support is enabled
dengqf6 commented 5 years ago

GCC 9.1 support has been pushed to openwrt/master, you can now test it out

diizzyy commented 5 years ago

@dengqf6 That doesn't really solve anything as 9.1 will most likely never be default compiler.

bodop commented 5 years ago

Tried bind-server and bind-libs compiled with gcc-8.3.0 and gcc-9.1.0 installed on a gcc-7 based openwrt-image. Both "gcc-8.3.0 -Os" and "gcc-9.1.0 -Os" crash whereas "gcc-9.1.0 -O0" runs fine. I was not able to print a backtrace of the crashes. I assume gdb was confused to find a core that involved shared libraries not consistent with the state of the sdk.

aon3ko commented 4 years ago

So seems this problem related to GCC optimize level? I will try O2 now.

bodop commented 4 years ago

That's also my impression.

aon3ko commented 4 years ago

That's also my impression.

O2 is still crash and generate isc-worker*.core file under /tmp.

neheb commented 4 years ago

No such file or directory. is a big hint.

I'd try strace instead of gdb.