https-dns-proxy: dmesg page fault

systemcrash commented 2 years ago

Maintainer: @stangri

--	Value
Model	TP-Link Archer C7 v2
Architecture	Qualcomm Atheros QCA9558 ver 1 rev 0
Target Platform	ath79/generic
Firmware Version	OpenWrt 22.03.0 r19685-512e76967f / LuCI openwrt-22.03 branch git-22.245.77528-487e58a
Kernel Version	5.10.138

Description:

added in https-dns-proxy into imagebuilder here. At boot, the kernel log shows:

[   78.258834] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 63222510
[   78.268317] epc = 77e72df4 in libc.so[77df4000+a9000]
[   78.273488] ra  = 77e74b04 in libc.so[77df4000+a9000]
...
[   89.156259] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 6322251a
[   89.165861] epc = 77dc2dc0 in libc.so[77d44000+a9000]
[   89.171129] ra  = 77dc4b04 in libc.so[77d44000+a9000]
...
[   94.439188] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 63222520
[   94.448917] epc = 77e2cdf4 in libc.so[77dae000+a9000]
[   94.454103] ra  = 77e2eb04 in libc.so[77dae000+a9000]
[   99.763575] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 63222525
[   99.773390] epc = 77e66dc0 in libc.so[77de8000+a9000]
[   99.778544] ra  = 77e68b04 in libc.so[77de8000+a9000]
[  104.985969] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 6322284c
[  104.995583] epc = 77e4adf4 in libc.so[77dcc000+a9000]
[  105.000758] ra  = 77e4cb04 in libc.so[77dcc000+a9000]
[  129.174846] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 63222865
[  129.184505] epc = 77e78dc0 in libc.so[77dfa000+a9000]
[  129.189687] ra  = 77e7ab04 in libc.so[77dfa000+a9000]
[  134.335382] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 6322286a
[  134.344791] epc = 77e4adc0 in libc.so[77dcc000+a9000]
[  134.350131] ra  = 77e4cb04 in libc.so[77dcc000+a9000]
[  139.404103] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 6322286f
[  139.413861] epc = 77e28dc0 in libc.so[77daa000+a9000]
[  139.419098] ra  = 77e2ab04 in libc.so[77daa000+a9000]
[  144.494286] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 63222874
[  144.503690] epc = 77de2df4 in libc.so[77d64000+a9000]
[  144.508995] ra  = 77de4b04 in libc.so[77d64000+a9000]
[  149.580321] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 63222879
[  149.589720] epc = 77eb2dc0 in libc.so[77e34000+a9000]
[  149.595016] ra  = 77eb4b04 in libc.so[77e34000+a9000]
[  154.672733] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 6322287e
[  154.682129] epc = 77eb4dc0 in libc.so[77e36000+a9000]
[  154.687367] ra  = 77eb6b04 in libc.so[77e36000+a9000]

stangri commented 2 years ago

Can you please post the output of opkg list-installed.

systemcrash commented 2 years ago

Here, sorry about that.

---trim---

stangri commented 2 years ago

Here, sorry about that.

I've had a wild theory that maybe the image was somehow built with some dependencies missing, but everything seems to be present.

I don't have a technical expertise to troubleshoot it further, I'd like to escalate this to the upstream to seek guidance, it could be beneficial to be on the latest version of libcurl for that.

I've created a PR for the 7.85.0 version of curl in master, I'll cherry-pick it for the release branches within a few days unless there are issues reported for the updated version in a master branch.

Ohfalderal commented 2 years ago

The segmentation fault (22.03) only occurs when logging is enabled and verbosity > 0.

N.B. - setting logging / verbosity for 21.02 does not page fault

stangri commented 2 years ago

The segmentation fault (22.03) only occurs when logging is enabled and verbosity > 0.

N.B. - setting logging / verbosity for 21.02 does not page fault

Thanks for adding this. Just so I have the complete information for upstream, can you please update curl/ibcurl to 7.85.0 and include the output for: curl --version?

Ohfalderal commented 2 years ago

opkg list-installed curl libcurl4 - 7.83.1-3

opkg install libcurl Upgrading libcurl4 on root from 7.83.1-3 to 7.85.0-1...

vi /etc/config/https-dns-proxy

service https-dns-proxy restart

Tue Sep 27 18:00:31 2022 user.notice https-dns-proxy: Starting service ✓✓✓✓ Tue Sep 27 18:00:31 2022 daemon.info dnsmasq[1]: read /tmp/hosts/dhcp.cfg01411c - 0 addresses Tue Sep 27 18:00:31 2022 kern.info kernel: [52874.728769] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 63331e1f Tue Sep 27 18:00:31 2022 kern.info kernel: [52874.738717] epc = 77e0cdc0 in libc.so[77d8e000+a9000] Tue Sep 27 18:00:31 2022 kern.info kernel: [52874.743959] ra = 77e0eb04 in libc.so[77d8e000+a9000]

stangri commented 2 years ago

I can't reproduce this issue with setting verbosity to 2.

Ohfalderal commented 2 years ago

Verbosity setting altered from '1' to '2' with no change to the observed result (SIGSEGV).

Kindly post your /etc/config/https-dns-proxy configuration.

As seen in the syslog snippet, 4X DOH servers are configured.

FYI - the device is TP-Link Archer C6 v2

Ohfalderal commented 2 years ago

So the trail ends here?

systemcrash commented 2 years ago

Well, it doesn't seem critical, since it works normally after bootup, when the system settles down. Maybe before config is loaded, the internal default settings trigger the crash?

stangri commented 2 years ago

I need a config from anyone experiencing this issue to try to reproduce this.

systemcrash commented 2 years ago

output of cat /etc/config/https-dns-proxy


config main 'config'
    option update_dnsmasq_config '*'
    option force_dns '1'
    list force_dns_port '53'
    list force_dns_port '853'

config https-dns-proxy
    option listen_addr '127.0.0.1'
    option user 'nobody'
    option group 'nogroup'
    option bootstrap_dns '74.82.42.42,2001:470:20::2'
    option resolver_url 'https://ordns.he.net/dns-query'
    option listen_port '5054'

config https-dns-proxy
    option bootstrap_dns '116.202.176.26,1.1.1.1'
    option resolver_url 'https://doh.libredns.gr/dns-query'
    option listen_port '5055'

config https-dns-proxy
    option resolver_url 'https://cloudflare-dns.com/dns-query'
    option bootstrap_dns '1.1.1.1,1.0.0.1,2606:4700:4700::1111,2606:4700:4700::1001'

Ohfalderal commented 2 years ago

Uncomment both 'logfile' and 'verbosity' options of any one of the four sections -

config main 'config'
    option update_dnsmasq_config '*'
    option force_dns '1'
    list force_dns_port '53'
    list force_dns_port '853'
# ports listed below are used by some
# of the dnscrypt-proxy v1 resolvers
#   list force_dns_port '553'
#   list force_dns_port '1443'
#   list force_dns_port '4343'
#   list force_dns_port '4434'
#   list force_dns_port '5443'
#   list force_dns_port '8443'

config https-dns-proxy
    option bootstrap_dns '2a01:3a0:53:53::'
    option resolver_url 'https://unicast.censurfridns.dk/dns-query'
    option use_ipv6_resolvers_only 1
#   option logfile '/var/log/DoH_unicast.censurfridns.dk.log' 
#   option verbosity 1 

config https-dns-proxy
    option bootstrap_dns '2a07:a8c0::84:5e57,2a07:a8c1::84:5e57'
    option resolver_url 'https://dns.nextdns.io/845e57'
    option use_ipv6_resolvers_only 1
#   option logfile '/var/log/DoH_dns.nextdns.io.log' 
#   option verbosity 1

config https-dns-proxy
    option bootstrap_dns '2001:67c:28a4::'
    option resolver_url 'https://anycast.uncensoreddns.org/dns-query'
    option use_ipv6_resolvers_only 1
#   option logfile '/var/log/DoH_anycast.censurfridns.dk.log' 
#   option verbosity 1

config https-dns-proxy
    option bootstrap_dns '2a07:a8c0::84:5e57,2a07:a8c1::84:5e57'
    option resolver_url 'https://secure.avastdns.com/dns-query'
    option use_ipv6_resolvers_only 1
#   option logfile '/var/log/DoH_secure.avastdns.com.log' 
#   option verbosity 1

Ohfalderal commented 2 years ago

@systemcrash

output of cat /etc/config/https-dns-proxy


config main 'config'
  option update_dnsmasq_config '*'
  option force_dns '1'
  list force_dns_port '53'
  list force_dns_port '853'

config https-dns-proxy
  option listen_addr '127.0.0.1'
  option user 'nobody'
  option group 'nogroup'
  option bootstrap_dns '74.82.42.42,2001:470:20::2'
  option resolver_url 'https://ordns.he.net/dns-query'
  option listen_port '5054'

config https-dns-proxy
  option bootstrap_dns '116.202.176.26,1.1.1.1'
  option resolver_url 'https://doh.libredns.gr/dns-query'
  option listen_port '5055'

config https-dns-proxy
  option resolver_url 'https://cloudflare-dns.com/dns-query'
  option bootstrap_dns '1.1.1.1,1.0.0.1,2606:4700:4700::1111,2606:4700:4700::1001'

Review the arguments of the spawned processes. Without the 'use_ipv6_resolvers_only' option set, only the IPV4 addresses are passed and using the option, only IPV6 addresses are passed.

stangri commented 2 years ago

option logfile '/var/log/DoH_unicast.censurfridns.dk.log'

I believe /var/log is owned by root and other users don't have the write permissions there, so nobody/nogroup can't write there. Could the crash be because of that?

Ohfalderal commented 2 years ago

No, the process(es) is spawned under 'root' and during initialisation, updates its credentials, as does DNSmasq.

root@pWPW10-2:~# ls -l /var/log/
-rw-r-----    1 root     root        251521 Oct  5 10:33 DoH_anycast.censurfridns.dk_IPV6.log
-rw-r-----    1 root     root         36170 Oct  5 10:33 DoH_dns.digitale-gesellschaft.ch_IPV6.log
-rw-r-----    1 root     root         49522 Oct  5 10:33 DoH_dns.nextdns.io_IPV6.log
-rw-r-----    1 root     root        251378 Oct  5 10:33 DoH_secure.avastdns.com_IPV6.log
-rw-r-----    1 root     root        249875 Oct  5 10:33 DoH_unicast.censurfridns.dk_IPV6.log

 4614 nobody    4180 S    /usr/sbin/https-dns-proxy -r https://unicast.censurfridns.dk/dns-query -a 127.0.0.1 -p 5053 -b 2a01:3a0:53
 4615 nobody    5352 S    /usr/sbin/https-dns-proxy -r https://dns.nextdns.io/845e57 -a 127.0.0.1 -p 5054 -b 2a07:a8c0::84:5e57,2a07
 4616 nobody    4180 S    /usr/sbin/https-dns-proxy -r https://anycast.uncensoreddns.org/dns-query -a 127.0.0.1 -p 5055 -b 2001:67c:
 4617 nobody    4676 S    /usr/sbin/https-dns-proxy -r https://dns.digitale-gesellschaft.ch/dns-query -a 127.0.0.1 -p 5056 -b 2606:4
 4618 nobody    4180 S    /usr/sbin/https-dns-proxy -r https://secure.avastdns.com/dns-query -a 127.0.0.1 -p 5057 -b 2001:67c:28a4::

Under 21.03, this config with logging does not SIGSEGV.

systemcrash commented 2 years ago

# ps | grep -i https-dns
 5059 nobody    3304 S    /usr/sbin/https-dns-proxy -r https://ordns.he.net/dns-query -a 127.0.0.1 -p 5054 -b 74.82.42.42 -4 -u nobody -g nogroup
 5060 nobody    3336 S    /usr/sbin/https-dns-proxy -r https://doh.libredns.gr/dns-query -a 127.0.0.1 -p 5055 -b 116.202.176.26,1.1.1.1 -4 -u nobody -g nogroup

Review the arguments of the spawned processes. Without the 'use_ipv6_resolvers_only' option set, only the IPV4 addresses are passed and using the option, only IPV6 addresses are passed.

I see. This config is generated by using luci GUI. I would have expected that both v4/6 are used from the config without specific intervention. @stangri can there be middle ground option for both v4/6? (don't wish to derail this bug report tho)

stangri commented 2 years ago

No, the process(es) is spawned under 'root' and during initialisation, updates its credentials, as does DNSmasq.

root@pWPW10-2:~# ls -l /var/log/
-rw-r-----    1 root     root        251521 Oct  5 10:33 DoH_anycast.censurfridns.dk_IPV6.log
-rw-r-----    1 root     root         36170 Oct  5 10:33 DoH_dns.digitale-gesellschaft.ch_IPV6.log
-rw-r-----    1 root     root         49522 Oct  5 10:33 DoH_dns.nextdns.io_IPV6.log
-rw-r-----    1 root     root        251378 Oct  5 10:33 DoH_secure.avastdns.com_IPV6.log
-rw-r-----    1 root     root        249875 Oct  5 10:33 DoH_unicast.censurfridns.dk_IPV6.log

 4614 nobody    4180 S    /usr/sbin/https-dns-proxy -r https://unicast.censurfridns.dk/dns-query -a 127.0.0.1 -p 5053 -b 2a01:3a0:53
 4615 nobody    5352 S    /usr/sbin/https-dns-proxy -r https://dns.nextdns.io/845e57 -a 127.0.0.1 -p 5054 -b 2a07:a8c0::84:5e57,2a07
 4616 nobody    4180 S    /usr/sbin/https-dns-proxy -r https://anycast.uncensoreddns.org/dns-query -a 127.0.0.1 -p 5055 -b 2001:67c:
 4617 nobody    4676 S    /usr/sbin/https-dns-proxy -r https://dns.digitale-gesellschaft.ch/dns-query -a 127.0.0.1 -p 5056 -b 2606:4
 4618 nobody    4180 S    /usr/sbin/https-dns-proxy -r https://secure.avastdns.com/dns-query -a 127.0.0.1 -p 5057 -b 2001:67c:28a4::

Under 21.03, this config with logging does not SIGSEGV.

Sorry, I'm confused, I thought you said that SIGSEGV happens at boot, yet there are non-zero log files. Could you please clarify the conditions it crashes and conditions it doesn't crash and fills up logs?

stangri commented 2 years ago

I see. This config is generated by using luci GUI. I would have expected that both v4/6 are used from the config without specific intervention. @stangri can there be middle ground option for both v4/6? (don't wish to derail this bug report tho)

Yes, please open a separate issue.

Ohfalderal commented 2 years ago

@stangri

Sorry, I'm confused, I thought you said that SIGSEGV happens at boot, yet there are non-zero log files. Could you please clarify the conditions it crashes and conditions it doesn't crash and fills up logs?

Those are logging files under 21.02 to illustrate that 'nobody' is successfully writing to them.

stangri commented 2 years ago

I see. I'll need to test this myself, I'll update this issue on the progress.

systemcrash commented 2 years ago

Well, I don't think it's dangerous, although fixing it is necessary. After the system is up and stable, no more of those entries.

Maybe something isn't ready during startup.

I'll test when I can, if it comes to that.

stangri commented 2 years ago

I can't reproduce the https-dns-proxy crash with version 2022-08-12 and the following config:

config https-dns-proxy
    option bootstrap_dns '149.112.121.10,149.112.122.10,2620:10A:80BB::10,2620:10A:80BC::10'
    option resolver_url 'https://private.canadianshield.cira.ca/dns-query'
    option verbosity  '2'
    option logfile '/var/log/DoH_canadianshield.log'

@Ohfalderal can you re-test with the new version and shortened file names for the logfile? If we can narrow down the source of the problem before reporting upstream, that'd be helpful.

Ohfalderal commented 2 years ago

@stangri I notice the update is available in the mirror repository that is configured in the AP and will pull and test it this coming weekend.

Ohfalderal commented 2 years ago

@stangri Uhh, no -

root@pWPW10-0:~# vi /etc/config/https-dns-proxy
root@pWPW10-0:~# service https-dns-proxy restart
Stopping https-dns-proxy 2022-08-12-1 ✓
Starting https-dns-proxy 2022-08-12-1 ✓✓✓
root@pWPW10-0:~# logread -l20
Sat Oct 15 09:32:55 2022 daemon.info dnsmasq[1]: using only locally-known addresses for local
Sat Oct 15 09:32:55 2022 daemon.info dnsmasq[1]: using only locally-known addresses for invalid
Sat Oct 15 09:32:55 2022 daemon.info dnsmasq[1]: using only locally-known addresses for bind
Sat Oct 15 09:32:55 2022 daemon.info dnsmasq[1]: using only locally-known addresses for mask-h2.icloud.com
Sat Oct 15 09:32:55 2022 daemon.info dnsmasq[1]: using only locally-known addresses for mask.icloud.com
Sat Oct 15 09:32:55 2022 daemon.info dnsmasq[1]: using 1 more local addresses
Sat Oct 15 09:32:55 2022 daemon.info dnsmasq[1]: read /etc/hosts - 4 addresses
Sat Oct 15 09:32:55 2022 daemon.info dnsmasq[1]: read /tmp/hosts/dhcp.cfg01411c - 0 addresses
Sat Oct 15 09:32:55 2022 kern.info kernel: [22419.664161] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 634a6227
Sat Oct 15 09:32:55 2022 kern.info kernel: [22419.673903] epc = 77eb0dc0 in libc.so[77e32000+a9000]
Sat Oct 15 09:32:55 2022 kern.info kernel: [22419.679387] ra  = 77eb2b04 in libc.so[77e32000+a9000]
Sat Oct 15 09:33:00 2022 kern.info kernel: [22424.768945] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 634a622c
Sat Oct 15 09:33:00 2022 kern.info kernel: [22424.778701] epc = 77e7adf4 in libc.so[77dfc000+a9000]
Sat Oct 15 09:33:00 2022 kern.info kernel: [22424.784060] ra  = 77e7cb04 in libc.so[77dfc000+a9000]
Sat Oct 15 09:33:05 2022 kern.info kernel: [22429.810433] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 634a6231
Sat Oct 15 09:33:05 2022 kern.info kernel: [22429.820151] epc = 77e16dc0 in libc.so[77d98000+a9000]
Sat Oct 15 09:33:05 2022 kern.info kernel: [22429.825650] ra  = 77e18b04 in libc.so[77d98000+a9000]
Sat Oct 15 09:33:11 2022 kern.info kernel: [22434.848670] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 634a6236
Sat Oct 15 09:33:11 2022 kern.info kernel: [22434.858382] epc = 77eb6dc0 in libc.so[77e38000+a9000]
Sat Oct 15 09:33:11 2022 kern.info kernel: [22434.863713] ra  = 77eb8b04 in libc.so[77e38000+a9000]
root@pWPW10-0:~# vi /etc/config/https-dns-proxy
root@pWPW10-0:~# service https-dns-proxy restart
Stopping https-dns-proxy 2022-08-12-1 ✓
Starting https-dns-proxy 2022-08-12-1 ✓✓✓
root@pWPW10-0:~# logread -l20
Sat Oct 15 09:34:13 2022 user.notice https-dns-proxy: Starting service ✓✓✓
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: started, version 2.86 cachesize 150
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: DNS service limited to local subnets
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus UBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP no-conntrack no-ipset no-auth no-cryptohash no-DNSSEC no-ID loop-detect inotify dumpfile
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: UBus support enabled: connected to system bus
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using nameserver 127.0.0.1#5055
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using nameserver 127.0.0.1#5053
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using nameserver 127.0.0.1#5054
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using only locally-known addresses for test
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using only locally-known addresses for onion
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using only locally-known addresses for localhost
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using only locally-known addresses for local
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using only locally-known addresses for invalid
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using only locally-known addresses for bind
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using only locally-known addresses for mask-h2.icloud.com
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using only locally-known addresses for mask.icloud.com
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: using 1 more local addresses
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: read /etc/hosts - 4 addresses
Sat Oct 15 09:34:13 2022 daemon.info dnsmasq[1]: read /tmp/hosts/dhcp.cfg01411c - 0 addresses
Sat Oct 15 09:34:27 2022 user.info : luci: accepted login on / for root from 192.168.179.198
root@pWPW10-0:~#

N.B. - 'vi' commands are:

reduce length of logfile name, increase verbosity level '2' and uncomment
comment logfile and verbosity

systemcrash commented 2 years ago

Problem remains in 22.03.2

```kernel logs [ 86.418247] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 634ebf2b [ 86.427647] epc = 77de2dc0 in libc.so[77d64000+a9000] [ 86.432809] ra = 77de4b04 in libc.so[77d64000+a9000] ... [ 88.751136] br-lan: port 3(wlan0) entered blocking state [ 88.756655] br-lan: port 3(wlan0) entered disabled state [ 88.762531] device wlan0 entered promiscuous mode [ 89.926855] IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready [ 89.933560] br-lan: port 2(wlan1) entered blocking state [ 89.938957] br-lan: port 2(wlan1) entered forwarding state [ 91.949680] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 634ebf31 [ 91.959038] epc = 77e4cdc0 in libc.so[77dce000+a9000] [ 91.964227] ra = 77e4eb04 in libc.so[77dce000+a9000] [ 97.198637] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 634ebf36 [ 97.208262] epc = 77eb4dc0 in libc.so[77e36000+a9000] [ 97.213451] ra = 77eb6b04 in libc.so[77e36000+a9000] [ 102.401394] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 634ebfa4 [ 102.410796] epc = 77dcedf4 in libc.so[77d50000+a9000] [ 102.415934] ra = 77dd0b04 in libc.so[77d50000+a9000] [ 107.647482] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 634ebfa9 [ 107.656919] epc = 77dc6dc0 in libc.so[77d48000+a9000] [ 107.662082] ra = 77dc8b04 in libc.so[77d48000+a9000] [ 112.748180] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 634ebfae [ 112.757920] epc = 77e0edc0 in libc.so[77d90000+a9000] [ 112.763103] ra = 77e10b04 in libc.so[77d90000+a9000] ```

systemcrash commented 2 years ago

Here is the build log for the firmware:

---trim---

stangri commented 1 year ago

Sorry for neglecting this. Can you still reproduce with 2022-10-15-10?

systemcrash commented 1 year ago

No worries. You've a lot on your plate.

It crashes, but now not only at boot, but also later during uptime.

stangri commented 1 year ago

It may be the mips_24kc exclusive problem, I can't (for the life of me) reproduce on x86_64. I don't even know who I can ask to have a look at it, as I don't think upstream devs use the same platform and I don't know enough about libc to troubleshoot.

systemcrash commented 1 year ago

Something interesting:

when I restart the daemon via the GUI restart button, I consistently get a crash:

[604017.161423] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 63c73e68
[604017.170910] epc = 77dfedf4 in libc.so[77d80000+a9000]
[604017.176307] ra  = 77e00b04 in libc.so[77d80000+a9000]
[604373.787103] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 63c73fcc
[604373.796612] epc = 77e92df4 in libc.so[77e14000+a9000]
[604373.801841] ra  = 77e94b04 in libc.so[77e14000+a9000]
[604378.877891] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 63c73fd1
[604378.887685] epc = 77e58dc0 in libc.so[77dda000+a9000]
[604378.892931] ra  = 77e5ab04 in libc.so[77dda000+a9000]

The nature of the crash looks a bit better from previous openwrt builds.

When I run an equivalent command line that I see via ps, e.g. /usr/sbin/https-dns-proxy -r https://ordns.he.net/dns-query -a 127.0.0.1 -p 5054 -b 74.82.42.42 -4 -u nobody -g nogroup no crash......

My google fu shows these kinds of crashes are not exclusive to my mips platform, but possibly the platform is more sensitive to things that run OK on x86_64.

systemcrash commented 1 year ago

If there's an equivalent version built with debug symbols and/or it runs with strace, we may get somewhere further.

systemcrash commented 1 year ago

Just a thought: could it be the /etc/init.d/https-dns-proxy script which triggers this?

stangri commented 1 year ago

I haven't looked into details on how PROCD daemonizes processes, actually out of all PROCD-using packages I maintain this is the only one with the true daemon. I'll double check the code.

stangri commented 1 year ago

Is this still happening with the new code to kill/respawn instances on interface restarts?

DianP commented 1 year ago

I didn't mean to bump this, but I'm also getting this error on the same devices & platform. Even after some fresh installation.


Model	TP-Link Archer C7 v2
Architecture	Qualcomm Atheros QCA9558 ver 1 rev 0
Target Platform	ath79/generic
Firmware Version	OpenWrt 22.03.3 r20028-43d71ad93e / LuCI openwrt-22.03 branch git-22.361.69894-438c598
Kernel Version	5.10.161
https-dns-proxy Version	2022-10-15-11

stangri commented 1 year ago

@aarond10 @baranyaib90 any insight on what may be happening on this specific platform? There have not been any reports of this on other platforms.

aarond10 commented 1 year ago

That's really weird. Invalid read access and libc.so (if not a red herring) makes me wonder if it's a failed malloc() that isn't being handled properly (i.e. null pointer returned and then dereferenced) but the location looks valid.

I found a similar bug for another package, also Archer C7v2: https://github.com/openwrt/packages/issues/19352 Could it be a platform-wide issue?

systemcrash commented 1 year ago

It's not inconceivable, although I cannot say I've noticed any ill effects. That said it's only on more recent major releases. 18 and 19 were unaffected. Guess: compiler related.

aarond10 commented 1 year ago

That would be my guess too -- From the data I can find on QCA9558, it's a single core SoC so memory races are likely not an issue but it seems quite a coincidence that it's the same sort of problem on the same hardware.

systemcrash commented 1 year ago

One possibility I've considered is that the memory or flash is dying. But not having experienced any other issues, I'm doubtful that's it.

A memtest would be nice but it's an involved process.

I'm fairly sure my flash has been to hell and back tho.

Ohfalderal commented 1 year ago

@aarond10 @systemcrash the C7 v5 is also afflicted.

systemcrash commented 1 year ago

@aarond10 @systemcrash the C7 v5 is also afflicted.

Probably the same underlying hardware platform.

In any case, the symptom continues on 22.03.4. But everything is otherwise working fine.

[   78.696834] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 643561b1
[   78.706235] epc = 77e00dc0 in libc.so[77d82000+a9000]
[   78.711408] ra  = 77e02b04 in libc.so[77d82000+a9000]
...
[   84.318139] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 643561b7
[   84.327535] epc = 77e34dc0 in libc.so[77db6000+a9000]
[   84.332698] ra  = 77e36b04 in libc.so[77db6000+a9000]
...
[   89.725811] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 643561bc
[   89.735310] epc = 77e38df4 in libc.so[77dba000+a9000]
[   89.740491] ra  = 77e3ab04 in libc.so[77dba000+a9000]
...
[   98.284561] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 643561c5
[   98.293977] epc = 77dc4dc0 in libc.so[77d46000+a9000]
[   98.299378] ra  = 77dc6b04 in libc.so[77d46000+a9000]
...
[  103.500905] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 643561ca
[  103.510325] epc = 77e22dc0 in libc.so[77da4000+a9000]
[  103.515466] ra  = 77e24b04 in libc.so[77da4000+a9000]
[  108.684551] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 643b3fd9
[  108.694295] epc = 77e7adc0 in libc.so[77dfc000+a9000]
[  108.699531] ra  = 77e7cb04 in libc.so[77dfc000+a9000]
[  113.766460] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 643b3fde
[  113.776216] epc = 77e5cdc0 in libc.so[77dde000+a9000]
[  113.781404] ra  = 77e5eb04 in libc.so[77dde000+a9000]
[  118.846343] do_page_fault(): sending SIGSEGV to https-dns-proxy for invalid read access from 643b3fe3
[  118.855804] epc = 77e4edc0 in libc.so[77dd0000+a9000]
[  118.861817] ra  = 77e50b04 in libc.so[77dd0000+a9000]

systemcrash commented 1 year ago

So I looked at the other linked issue, and apparently:

It was indeed related to the change in time_t (32 bits to 64 bits on a 32 bit system) with musl.

@stangri - perhaps there is some useful info in the commit? Hint: change time types e.g. from long int to time_t (altho I don't immediately see those in the source).

Ohfalderal commented 1 year ago

@systemcrash - logging generally includes timestamps. Doh!

stangri commented 1 year ago

So I looked at the other linked issue, and apparently:

It was indeed related to the change in time_t (32 bits to 64 bits on a 32 bit system) with musl.

@stangri - perhaps there is some useful info in the commit? Hint: change time types e.g. from long int to time_t (altho I don't immediately see those in the source).

@aarond10 I'm out of my depth here, anything useful in the link above?

aarond10 commented 1 year ago

I'm still quite confused. There is no use of time_t in https_dns_proxy. I did a bit of a cursory search on github for libev and libcares but can't find direct references to time_t in either of those as well.

On Tue, 18 Apr 2023 at 14:00, Stan Grishin @.***> wrote:

So I looked at the other linked issue https://github.com/openwrt/packages/issues/19510#issuecomment-1283148510, and apparently:

It was indeed related to the change in time_t (32 bits to 64 bits on a 32 bit system) with musl.

@stangri https://github.com/stangri - perhaps there is some useful info in the commit https://github.com/openwrt/packages/commit/f7d592607f876a64d4e641e57d6300ab99d1e9e9? Hint: change time types e.g. from long int to time_t (altho I don't immediately see those in the source).

@aarond10 https://github.com/aarond10 I'm out of my depth here, anything useful in the link above?

— Reply to this email directly, view it on GitHub https://github.com/openwrt/packages/issues/19366#issuecomment-1512400166, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABTOXTUI2FCOPBAEGXOXCLXBYGW5ANCNFSM6AAAAAAQMXDX4A . You are receiving this because you were mentioned.Message ID: @.***>

schmaller commented 1 year ago

After compiling the package with warning output one can see the relevant fprintf calls. Fixing this warnings by switching to %lld and casting the parameter to (long long) solved the issue for me.

/usr/src/OpenWRT/openwrt/build_dir/target-mips_24kc_musl/https-dns-proxy-2022-10-15/src/logging.c: In function '_log':
/usr/src/OpenWRT/openwrt/build_dir/target-mips_24kc_musl/https-dns-proxy-2022-10-15/src/logging.c:80:24: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'time_t' {aka 'long long int'} [-Wformat=]
   80 |   fprintf(logf, "%s %8ld.%06ld %s:%d ", SeverityStr[severity], tv.tv_sec,
      |                     ~~~^                                       ~~~~~~~~~
      |                        |                                         |
      |                        long int                                  time_t {aka long long int}
      |                     %8lld
/usr/src/OpenWRT/openwrt/build_dir/target-mips_24kc_musl/https-dns-proxy-2022-10-15/src/logging.c:80:30: warning: format '%ld' expects argument of type 'long int', but argument 5 has type 'suseconds_t' {aka 'long long int'} [-Wformat=]
   80 |   fprintf(logf, "%s %8ld.%06ld %s:%d ", SeverityStr[severity], tv.tv_sec,
      |                          ~~~~^
      |                              |
      |                              long int
      |                          %06lld
   81 |           tv.tv_usec, file, line);
      |           ~~~~~~~~~~
      |             |
      |             suseconds_t {aka long long int}

schmaller commented 1 year ago

@aarond10 Is my fix also suitable for upstream patching of your (great :wink:)) project ? If so, do you want to patch it yourself to be conform with all the guidelines, or should I try to provide a PR ?

aarond10 commented 1 year ago

Ah! Of course! It's the logging code... Thanks for the fix. I think it'll need a little more conditional handling though for cases where time_t is still 32-bit.

openwrt / packages

https-dns-proxy: dmesg page fault #19366

option logfile '/var/log/DoH_unicast.censurfridns.dk.log'