Closed pdp7 closed 3 years ago
@rvs could you add more information on under what circumstances you see this occur?
@MichaelZhuxx @tekkamanninja please take a look
I can provide any information if somebody tells me where to look for ;-) For now all I can say is that this is a pretty high end eMMC card https://www.amazon.com/gp/product/B07G5Q2TRL/ref=ppx_yo_dt_b_asin_title_o00_s02?ie=UTF8&psc=1 and the issue seems to be happening quire frequently.
It is indeed seems to be related to when I use block layer a lot (like when upgrading the system via dnf, etc.)
That said, it seems to appear in other ways too. See below:
[ 128.685014] Unable to handle kernel paging request at virtual address 0000005f826e406c
[ 128.719049] Oops [#1]
[ 128.747105] Modules linked in: ip_set nfnetlink ebtable_filter rfkill ebtables ip6table_filter ip6_tables iptable_filter sunrpc ip_tables
[ 128.785899] CPU: 0 PID: 253 Comm: kworker/0:3 Tainted: G W 5.10.6+ #26
[ 128.820288] Workqueue: ipv6_addrconf addrconf_dad_work
[ 128.852125] epc: ffffffdf826e406c ra : ffffffe000b49366 sp : ffffffe085e1fb50
[ 128.886014] gp : ffffffe0018416a8 tp : ffffffe085f30000 t0 : ffffffe086cdcfe8
[ 128.919985] t1 : 0000000000010000 t2 : 0000000000000000 s0 : ffffffe085e1fba0
[ 128.954015] s1 : 0000000000000000 a0 : 0000000000000000 a1 : ffffffe086c66f00
[ 128.988047] a2 : ffffffe085e1fbb0 a3 : 0000000000000000 a4 : ffffffdf826e406c
[ 129.022083] a5 : ffffffe084ad69c0 a6 : 0000000020000000 a7 : 0000000000000000
[ 129.056073] s2 : ffffffe086c66f00 s3 : ffffffe084ad69c0 s4 : ffffffe086c66f00
[ 129.090156] s5 : ffffffe085e1fbb0 s6 : 0000000000000001 s7 : 0000000000000003
[ 129.124203] s8 : ffffffe0843a1c00 s9 : 0000000000002000 s10: 0000000000000060
[ 129.124211] s11: 00000000000000ff t3 : 6facdd6262ddaedf t4 : 0000000000000000
[*** ] (1 of 2) A start j[ 129.124221] status: 0000000200000120 badaddr: 0000005f826e406c cause: 000000000000000c
I should stop kibitzing as I don't have time to dive in, but to my untrained eye, this 2nd Oops looks like an unrelated issue. EDIT: Or perhaps this isn't related to mmc at all.
I am now convinced if mmc is involved that would be only as a trigger mechanism -- I can now reliably get that same Oops with variety of things just doing random I/O -- like this wget downloading a file for a long time into nothing:
[ 1934.256949] Oops [#1]
[ 1934.259298] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink rfkill ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc ip_tables
[ 1934.301176] CPU: 0 PID: 635 Comm: wget Tainted: G W 5.10.6+ #26
[ 1934.308592] epc: ffffffe000ab5a00 ra : ffffffe000ab5f80 sp : ffffffe0811a7280
[ 1934.315901] gp : ffffffe0018416a8 tp : ffffffe080708000 t0 : ffffffe1fed9b9e4
[ 1934.323300] t1 : ffffffe0008e2e9a t2 : 49d0449df433ff0c s0 : ffffffe0811a72a0
[ 1934.330698] s1 : 0010000000000000 a0 : ffffffe000ab5f80 a1 : 0000000000000300
[ 1934.338092] a2 : 0000000000000000 a3 : 0000000000ffff00 a4 : 0000000000000000
[ 1934.345489] a5 : ffffffe1fed09480 a6 : 0000000000000001 a7 : 0000000000000042
[ 1934.352893] s2 : ffffffe0018ab6d8 s3 : 0010000000000000 s4 : ffffffe0822ff164
[ 1934.360298] s5 : ffffffe0017d18c0 s6 : 0000000000000001 s7 : 0000000000000001
[ 1934.367706] s8 : ffffffe0017d18c0 s9 : 0000000000000002 s10: ffffffe082b61300
[ 1934.375139] s11: ffffffe082292300 t3 : 0000000200000022 t4 : 0000000000000004
[ 1934.382545] t5 : 0000000000910000 t6 : ffffffe01fb3c042
[ 1934.387986] status: 0000000200000120 badaddr: ffffff8000000058 cause: 000000000000000d
[ 1934.396319] ---[ end trace caa71343f7d35051 ]---
[ 1934.401201] Kernel panic - not syncing: Fatal exception in interrupt
[ 1934.407748] SMP: stopping secondary CPUs
[ 1934.411836] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Ouch! I just realized I was commenting on the wrong issue -- please take a look at https://github.com/starfive-tech/Fedora_on_StarFive/issues/27#issuecomment-830351296
Has anyone seen this issue still occuring?
Current latest kernel would be 5.13-rc3: https://github.com/starfive-tech/linux/tree/esmil_starlight
I did not experience the page fault exception error could not allocating the page while development on 5.13-rc3. I encountered the same error when I was debugging this patch https://github.com/mcd500/linux-jh7100/commit/dfe8b665829d1c4989bbb616f99a6775e0c24675 which require adding page fault handling properly when accessing virtual memory, but not from other places in the kernel. So I think it is fine to close this issue.
I agree @mcd500 -- this issue is no longer applicable (since we all moved completely away from that kernel)
Roman Shaposhnik (@rvs) reported in Slack: