Closed MichaelZhuxx closed 3 years ago
This bug leads to fatal kernel crashes:
[ 118.270973] stmmaceth 10020000.gmac eth0: PHY [stmmac-0:00] driver [Generic PHY] (irq=POLL)
[ 118.309890] dwmac1000: Master AXI performs fixed burst length
[ 118.341452] stmmaceth 10020000.gmac eth0: No Safety Features support found
[ 118.374196] stmmaceth 10020000.gmac eth0: No MAC Management Counters available
[ 118.407037] stmmaceth 10020000.gmac eth0: IEEE 1588-2008 Advanced Timestamp supported
[ 118.450948] stmmaceth 10020000.gmac eth0: registered PTP clock
[ 118.493427] stmmaceth 10020000.gmac eth0: configuring for phy/rgmii-txid link mode
[ 118.929631] dm9601 1-1.1:1.0 eth1: link up, 100Mbps, full-duplex, lpa 0xFFFF
[ 119.028528] L2CACHE: flush64 out of range: 8000000000000000(8), skip flush
[ 119.061134] L2CACHE: flush64 out of range: 8000000000000000(8), skip flush
[ 119.094827] L2CACHE: flush64 out of range: 8000000080000000(8), skip flush
[ 119.127225] L2CACHE: flush64 out of range: 8000000080000000(8), skip flush
[ 119.602284] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 119.634025] L2CACHE: flush64 out of range: c000000080000000(8), skip flush
[ 119.666339] L2CACHE: flush64 out of range: c000000080000000(8), skip flush
[ *** ] (1 of 2) A start job is [ 120.245271] Unable to handle kernel paging request at virtual address 0000005f81981894
[ 120.245287] Oops [#1]
[ 120.245292] Modules linked in: nfnetlink ebtable_filter rfkill ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sr9700 dm9601 usbnet ip_tables
[ 120.245356] CPU: 0 PID: 196 Comm: plymouthd Tainted: G W 5.10.6+ #26
[ 120.245363] epc: ffffffdf81981894 ra : ffffffe00090bf54 sp : ffffffe085e533d0
[ 120.245368] gp : ffffffe0018416a8 tp : ffffffe084b40000 t0 : 0000000000000040
[ 120.245374] t1 : ffffffe1fed7d000 t2 : ffffffe000ea9d88 s0 : ffffffe085e53400
[ 120.245380] s1 : ffffffe085efc6c0 a0 : ffffffe085efc6c0 a1 : ffffffe001843228
[ 120.245385] a2 : ffffffe001843228 a3 : 0000000000000000 a4 : ffffffe08019fce0
[ 120.245391] a5 : ffffffdf81981894 a6 : 0000000000000000 a7 : 000000000000000c
[ 120.245396] s2 : ffffffe080706000 s3 : 0000000000000000 s4 : 0000000000000000
[ 120.245401] s5 : ffffffe0018ab6d8 s6 : ffffffe00004b480 s7 : ffffffe080706164
[ 120.245407] s8 : ffffffe00004b480 s9 : ffffffe00004b330 s10: ffffffe001844208
[ 120.245412] s11: 0000000000000010 t3 : 000000000000007f t4 : 0000000000000000
[ 120.245417] t5 : 00000001fecbe000 t6 : ffffffe0835ddeb0
[ 120.245424] status: 0000000200000120 badaddr: 0000005f81981894 cause: 000000000000000c
[ 120.245436] ---[ end trace 36a7b0e5940bba54 ]---
[ 120.245445] Kernel panic - not syncing: Fatal exception in interrupt
[ 120.245453] SMP: stopping secondary CPUs
[ 120.881047] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
running for… Wait Online (1min 20s / no limit)
Hey @xypron,
I just found there is another v5.10 kernel repo at starfive-tech: https://github.com/starfive-tech/sft-riscv-linux-5.10, it has some more commits and at least one that might help with your issue, at least according to the patch description:
starfive-tech/sft-riscv-linux-5.10@e6aa768
@marckleinebudde Thanks for noting that.
Yes there are currently two StarFive Linux 5.10 repos:
Fu Wei (@tekkamanninja) is planning to bring these together into one common repo that both Freelight-U-SDK and Fedora can point to.
It looks like the https://github.com/starfive-tech/beagle_kernel_5.10 is older, can we delete it to avoid confusion?
https://github.com/starfive-tech/beagle_kernel_5.10/tree/Fedora has a commit that is only 12 hours old. It would make sense to make Fedora the main branch.
@xypron right! The last time I checked was before I went to bed :)
@marckleinebudde @xypron there is new fedora branches just pushed by @tekkamanninja
We agreed today in StarFive meeting that we need to have just one kernel repo and just one uboot repo that both https://github.com/starfive-tech/freelight-u-sdk/tree/starfive and https://github.com/starfive-tech/beaglev_fedora use. @tekkamanninja is working on this consolidation.
@MichaelZhuxx This is a Fedora issue and please assign to tekkamanninja.
I understand this is only expected in the VIC7100. But, this seems pretty consistent happening with this address 2080200000
.
On my boad plugging in a USB drive I also see something like the below.
[riscv@fedora-starfive ~]$ sudo vgs
L2CACHE: flush64 out of range: 2080200000(20000), skip flush
L2CACHE: flush64 out of range: 2080200000(20000), skip flush
WARNING: PV /dev/sda1 in VG data2 is using an old PV header, modify the VG to update.
VG #PV #LV #SN Attr VSize VFree
data2 1 1 0 wz--n- <1024.00g <774.00g
This seems related to this kernel patch: https://github.com/esmil/linux/commit/1ab9c4c0f8e3d3ad2199e6f43b11fc25d3080919
I am guessing we are somewhere calling flush where we shouldn't. Do we know which call to flush is actually causing the noise above with usb? It seems like the above patch to flush dcache for every write is to workaround the issue of memory writes not working correctly on the VIC7100.
Some links:
So I suspect if we fix up the dcache flush workaround the USB may be more stable. Should this issue be moved to linux instead of fedora?
Closing this issue in favor of https://github.com/starfive-tech/linux/issues/1
Based on fedora image: Fedora-riscv64-vic7100-dev-raw-image-Rawhide-20210419121453.n.0-sda.raw
After system starts, plug in a SuperSpeed USB device from thinkplus, several flush information is pop-up.
this USB flash disk cannot wok normally.