[kernel] Unable to handle kernel paging request at virtual address

pdp7 commented 3 years ago

Roman Shaposhnik (@rvs) reported in Slack:

All of a sudden I started getting a lot of:

[root@fedora-starfive gpio]# [38179.087918] Unable to handle kernel paging request at virtual address 00000061fed09482
[38179.096079] Oops [#1]
[38179.098421] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink rfkill ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc ip_tables
[38179.140298] CPU: 0 PID: 1842 Comm: kworker/0:1H Tainted: G        W         5.10.6+ #26
[38179.148550] Workqueue:  0x0 (mmc_complete)
[38179.152768] epc: ffffffe00022affe ra : ffffffe00022afea sp : ffffffe086d3fe30
[38179.160087]  gp : ffffffe0018416a8 tp : ffffffe080735040 t0 : ffffffe086d3fcc0
[38179.167490]  t1 : ffffffe1fed7d000 t2 : 00000000001b8d01 s0 : ffffffe086d3fea0
[38179.174909]  s1 : 0000000000000001 a0 : 0000000000000001 a1 : ffffffe093ff4180
[38179.182308]  a2 : 0000000000000402 a3 : 0000000000000004 a4 : fdffffe1fed09480
[38179.189740]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000735049
[38179.197141]  s2 : ffffffe1fed17640 s3 : ffffffe08d2b5868 s4 : ffffffe00022ae54
[38179.204544]  s5 : ffffffe1fed17660 s6 : ffffffe001608940 s7 : ffffffe0018ab6d8
[38179.211944]  s8 : ffffffe0018ab6d8 s9 : fdffffe00004b480 s10: ffffffe00004b480
[38179.219343]  s11: ffffffe08d2b5840 t3 : 000000000000007f t4 : 0000000000209bd3
[38179.226741]  t5 : 00000000021fb4f6 t6 : ffffffe000e028ec
[38179.232212] status: 0000000200000100 badaddr: 00000061fed09482 cause: 000000000000000d
[38179.245800] ---[ end trace eb7e55445fd7397c ]---

This is with the stock Fedora image FWIW — just wondering if anyone else is seeing similar. The board basically runs for a little while, then gets slow, then this happens

pdp7 commented 3 years ago

@rvs could you add more information on under what circumstances you see this occur?

@MichaelZhuxx @tekkamanninja please take a look

rvs commented 3 years ago

I can provide any information if somebody tells me where to look for ;-) For now all I can say is that this is a pretty high end eMMC card https://www.amazon.com/gp/product/B07G5Q2TRL/ref=ppx_yo_dt_b_asin_title_o00_s02?ie=UTF8&psc=1 and the issue seems to be happening quire frequently.

It is indeed seems to be related to when I use block layer a lot (like when upgrading the system via dnf, etc.)

That said, it seems to appear in other ways too. See below:

[  128.685014] Unable to handle kernel paging request at virtual address 0000005f826e406c
[  128.719049] Oops [#1]
[  128.747105] Modules linked in: ip_set nfnetlink ebtable_filter rfkill ebtables ip6table_filter ip6_tables iptable_filter sunrpc ip_tables
[  128.785899] CPU: 0 PID: 253 Comm: kworker/0:3 Tainted: G        W         5.10.6+ #26
[  128.820288] Workqueue: ipv6_addrconf addrconf_dad_work
[  128.852125] epc: ffffffdf826e406c ra : ffffffe000b49366 sp : ffffffe085e1fb50
[  128.886014]  gp : ffffffe0018416a8 tp : ffffffe085f30000 t0 : ffffffe086cdcfe8
[  128.919985]  t1 : 0000000000010000 t2 : 0000000000000000 s0 : ffffffe085e1fba0
[  128.954015]  s1 : 0000000000000000 a0 : 0000000000000000 a1 : ffffffe086c66f00
[  128.988047]  a2 : ffffffe085e1fbb0 a3 : 0000000000000000 a4 : ffffffdf826e406c
[  129.022083]  a5 : ffffffe084ad69c0 a6 : 0000000020000000 a7 : 0000000000000000
[  129.056073]  s2 : ffffffe086c66f00 s3 : ffffffe084ad69c0 s4 : ffffffe086c66f00
[  129.090156]  s5 : ffffffe085e1fbb0 s6 : 0000000000000001 s7 : 0000000000000003
[  129.124203]  s8 : ffffffe0843a1c00 s9 : 0000000000002000 s10: 0000000000000060
[  129.124211]  s11: 00000000000000ff t3 : 6facdd6262ddaedf t4 : 0000000000000000
[***   ] (1 of 2) A start j[  129.124221] status: 0000000200000120 badaddr: 0000005f826e406c cause: 000000000000000c

tommythorn commented 3 years ago

I should stop kibitzing as I don't have time to dive in, but to my untrained eye, this 2nd Oops looks like an unrelated issue. EDIT: Or perhaps this isn't related to mmc at all.

rvs commented 3 years ago

I am now convinced if mmc is involved that would be only as a trigger mechanism -- I can now reliably get that same Oops with variety of things just doing random I/O -- like this wget downloading a file for a long time into nothing:

[ 1934.256949] Oops [#1]
[ 1934.259298] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink rfkill ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc ip_tables
[ 1934.301176] CPU: 0 PID: 635 Comm: wget Tainted: G        W         5.10.6+ #26
[ 1934.308592] epc: ffffffe000ab5a00 ra : ffffffe000ab5f80 sp : ffffffe0811a7280
[ 1934.315901]  gp : ffffffe0018416a8 tp : ffffffe080708000 t0 : ffffffe1fed9b9e4
[ 1934.323300]  t1 : ffffffe0008e2e9a t2 : 49d0449df433ff0c s0 : ffffffe0811a72a0
[ 1934.330698]  s1 : 0010000000000000 a0 : ffffffe000ab5f80 a1 : 0000000000000300
[ 1934.338092]  a2 : 0000000000000000 a3 : 0000000000ffff00 a4 : 0000000000000000
[ 1934.345489]  a5 : ffffffe1fed09480 a6 : 0000000000000001 a7 : 0000000000000042
[ 1934.352893]  s2 : ffffffe0018ab6d8 s3 : 0010000000000000 s4 : ffffffe0822ff164
[ 1934.360298]  s5 : ffffffe0017d18c0 s6 : 0000000000000001 s7 : 0000000000000001
[ 1934.367706]  s8 : ffffffe0017d18c0 s9 : 0000000000000002 s10: ffffffe082b61300
[ 1934.375139]  s11: ffffffe082292300 t3 : 0000000200000022 t4 : 0000000000000004
[ 1934.382545]  t5 : 0000000000910000 t6 : ffffffe01fb3c042
[ 1934.387986] status: 0000000200000120 badaddr: ffffff8000000058 cause: 000000000000000d
[ 1934.396319] ---[ end trace caa71343f7d35051 ]---
[ 1934.401201] Kernel panic - not syncing: Fatal exception in interrupt
[ 1934.407748] SMP: stopping secondary CPUs
[ 1934.411836] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

rvs commented 3 years ago

Ouch! I just realized I was commenting on the wrong issue -- please take a look at https://github.com/starfive-tech/Fedora_on_StarFive/issues/27#issuecomment-830351296

pdp7 commented 3 years ago

Has anyone seen this issue still occuring?

Current latest kernel would be 5.13-rc3: https://github.com/starfive-tech/linux/tree/esmil_starlight

mcd500 commented 3 years ago

I did not experience the page fault exception error could not allocating the page while development on 5.13-rc3. I encountered the same error when I was debugging this patch https://github.com/mcd500/linux-jh7100/commit/dfe8b665829d1c4989bbb616f99a6775e0c24675 which require adding page fault handling properly when accessing virtual memory, but not from other places in the kernel. So I think it is fine to close this issue.

rvs commented 3 years ago

I agree @mcd500 -- this issue is no longer applicable (since we all moved completely away from that kernel)

starfive-tech / Fedora_on_StarFive

[kernel] Unable to handle kernel paging request at virtual address #26