openwrt / openwrt

This repository is a mirror of https://git.openwrt.org/openwrt/openwrt.git It is for reference only and is not active for check-ins. We will continue to accept Pull Requests here. They will be merged via staging trees then into openwrt.git.
Other
19.92k stars 10.35k forks source link

ramips: mt7621: kernel crash/reboot #12057

Open wberrier opened 1 year ago

wberrier commented 1 year ago

Describe the bug

Under certain conditions, the router reboots, sometimes every 10 minutes. It seemed to happen more once I had two of these routers in a mesh setup.

OpenWrt version

21.02.5, r16688-fa9a932fdb

OpenWrt target/subtarget

ramips/mt7621

Device

D-Link DIR-882 A1

Image kind

Official downloaded image

Steps to reproduce

Set up 2 routers as such:

router 1:

wired wan 2.4 and 5ghz APS 5ghz mesh node enable some settings on both 5g and 2g radios:

 option ieee80211r '1'
       option mobility_domain '4f57'
       option ft_over_ds '1'
       option ft_psk_generate_local '1'
       option ieee80211w '1'

router 2:

5ghz mesh node wired lan clients generating lots of traffic

Note, only router 1 reboots. Router 2 doesn't reboot...

Actual behaviour

Router 1 reboots with some of these variations (sorry, no serial port, these come over ssh before the connection is closed):

kern.alert kernel: [ 1912.963040] CPU 2 Unable to handle kernel paging request at virtual address 00000001, epc == 86529270, ra == 865359d4

kern.alert kernel: [ 1912.859490] CPU 2 Unable to handle kernel paging request at virtual address 00000001, epc == 86489270, ra == 8647d9d4

Expected behaviour

It shouldn't reboot

Additional info

I used the image builder to build these images.

Diffconfig

No response

Terms

zenerdyod commented 1 year ago

Will need the crash logs to check this. There is not enough info.

Spudz76 commented 1 year ago

Try snapshot/master and see if the issue is already fixed, but not backported.

wberrier commented 1 year ago

Will need the crash logs to check this. There is not enough info.

How do I collect these? Do I need to solder a serial port? Any other options?

Try snapshot/master and see if the issue is already fixed, but not backported.

I can try that too, although my setup relies pretty heavily on iptables, so it may take me a bit to convert things over.

Djfe commented 1 year ago

Did you have any time to test a newer release or master, yet? :)

wberrier commented 1 year ago

Apologies, it took me a bit to convert my setup to fw4/nftables...

I'm running 22.03.5 since this morning and saw a reboot already:

Sat Jun 3 02:04:05 2023 kern.alert kernel: [ 9310.516313] CPU 2 Unable to handle kernel paging request at virtual address 00000001, epc == 82cc8f08, ra == 82a35860

I'll try to get some logread logs over ssh to see if they match, and then can try a snapshot build.

wberrier commented 1 year ago

I updated to snapshot, and am still seeing this:

OpenWrt SNAPSHOT, r23213-edb3a4162c

Sat Jun  3 19:14:22 2023 kern.alert kernel: [ 1470.877024] CPU 3 Unable to handle kernel paging request at virtual address 00000001, epc == 831c93a8, ra == 831d951c
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.898211] Oops[#1]:                                                                                                                                                                            
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.902740] CPU: 3 PID: 28 Comm: ksoftirqd/3 Not tainted 5.15.114 #0                                                                                                                             
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.915388] $ 0   : 00000000 00000001 824c7f80 00000000
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.925822] $ 4   : 82e0c648 8356d46c 853bfda0 8356d700
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.936246] $ 8   : 00000100 00000040 00000004 817a4c48
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.946670] $12   : 00000fff 0000000c 00000001 00000000
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.957093] $16   : 82e097a0 81d38380 814b97ac 8356d46c
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.967518] $20   : 82e0c648 00000002 00000000 0000007f
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.977942] $24   : 00000002 803337e8
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.988365] $28   : 814b8000 814b96c0 853bfd80 831d951c
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1470.998792] Hi    : 00000000
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.004514] Lo    : 00000004
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.010233] epc   : 831c93a8 mt7615_mac_set_rates+0x40/0xde0 [mt7615_common]
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.024300] ra    : 831d951c mt7615_tx_prepare_skb+0x194/0x268 [mt7615e]
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.037648] Status: 1100fc03      KERNEL EXL IE
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.045987] Cause : 40800008 (ExcCode 02)
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.053958] BadVA : 00000001
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.059679] PrId  : 0001992f (MIPS 1004Kc)
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.067820] Modules linked in: pppoe ppp_async wireguard pppox ppp_generic nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet mt7615e mt7615_common mt76_connac_lib mt76 mac80
211 libchacha20poly1305 ipt_REJECT ftdi_sio cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbyt
es xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY usbserial slhc poly1305_mips nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nf
t_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_reject_ipv6 nf_reject_ipv4 nf_nat nf_log_syslog nf_flow_table nf_conncount libcurve25519_generic libcrc32c iptable_mangle iptable_filter ipt_ECN
 ip_tables hwmon crc_ccitt compat chacha_mips act_connmark nf_conntrack nf_defrag_ipv6
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.068548]  nf_defrag_ipv4 sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact ledtrig_usbport xt_set x_tables 
ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ipmac ip_set_ha
sh_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ifb ip6_udp_tunnel udp_tunnel sha512_generic seqiv jitterentropy_rng drbg kpp hmac cmac leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd gpio_button_hotpl
ug usbcore nls_base usb_common crc32c_generic
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.366661] Process ksoftirqd/3 (pid: 28, threadinfo=8f49d49a, task=c35ecad9, tls=00000000)
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.383297] Stack : 00000048 00000fba 85fc8000 8001fbb8 97e60000 60c13eb2 e411a30e 814b9824
Sat Jun  3 19:14:22 2023 kern.warn kernel: [ 1471.399982]         00000000 80a80000 82e097a0 81d38380 814b97ac 
wberrier commented 1 year ago

I have 3 identical routers, and I had a little bit of luck trying configurations to know when I'd hit this.

I thought it solely had to with mesh networking (and it definitely shows up more when the router is in a mesh configuration), but I ended up getting a reboot for one of the routers not in the mesh. It turned out to be roaming settings:

       option ieee80211r '1'
       option mobility_domain '4f57'
       option ft_over_ds '1'
       option ft_psk_generate_local '1'
       option ieee80211w '1'

Once I removed those, I haven't seen any reboots yet :crossed_fingers:

yulinbanshi commented 1 year ago

     这是来自鸿霞的假期自动回复邮件。 您好,你的来信已经收到,现无法马上回复您的邮件。我将尽快给您回复。      谢谢!