starfive-tech / Fedora_on_StarFive

62 stars 12 forks source link

[USB] L2Cache flush issue for some USB device #3

Closed MichaelZhuxx closed 3 years ago

MichaelZhuxx commented 3 years ago

Based on fedora image: Fedora-riscv64-vic7100-dev-raw-image-Rawhide-20210419121453.n.0-sda.raw

After system starts, plug in a SuperSpeed USB device from thinkplus, several flush information is pop-up.

this USB flash disk cannot wok normally.

[root@fedora-starfive ~]# [ 2578.848806] usb 2-1.3: new SuperSpeed Gen 1 USB device number 3 using xhci-hcd
[ 2578.880241] usb 2-1.3: New USB device found, idVendor=17ef, idProduct=3899, bcdDevice= 1.00
[ 2578.888941] usb 2-1.3: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[ 2578.896452] usb 2-1.3: Product: 512GB thinkplus
[ 2578.901247] usb 2-1.3: Manufacturer: TU100Pro
[ 2578.905776] usb 2-1.3: SerialNumber: 00000000047F
[ 2579.528885] usbcore: registered new interface driver usb-storage
[ 2579.705961] scsi host0: uas
[ 2579.710337] L2CACHE: flush64 out of range: 2080200000(24), skip flush
[ 2579.717330] L2CACHE: flush64 out of range: 2080200000(4c), skip flush
[ 2579.724757] usbcore: registered new interface driver uas
[ 2579.734019] scsi 0:0:0:0: Direct-Access     TU100Pro 512GB thinkplus  0    PQ: 0 ANSI: 6
[ 2579.743928] L2CACHE: flush64 out of range: 2080200000(ff), skip flush
[ 2579.759144] L2CACHE: flush64 out of range: 2080200000(ff), skip flush
[ 2579.766136] L2CACHE: flush64 out of range: 2080200000(ff), skip flush
[ 2579.773212] L2CACHE: flush64 out of range: 2080200000(ff), skip flush
[ 2579.780272] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2579.790380] L2CACHE: flush64 out of range: 2080200000(20), skip flush
[ 2579.794421] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 2579.802943] L2CACHE: flush64 out of range: 2080200000(40), skip flush
[ 2579.814720] L2CACHE: flush64 out of range: 2080200000(40), skip flush
[ 2579.821790] L2CACHE: flush64 out of range: 2080200000(40), skip flush
[ 2579.830000] L2CACHE: flush64 out of range: 2080200000(40), skip flush
[ 2579.836958] sd 0:0:0:0: [sda] 1000215216 512-byte logical blocks: (512 GB/477 GiB)
[ 2579.845137] L2CACHE: flush64 out of range: 2080200000(4), skip flush
[ 2579.852028] sd 0:0:0:0: [sda] Write Protect is off
[ 2579.857124] L2CACHE: flush64 out of range: 2080200000(4), skip flush
[ 2579.864105] L2CACHE: flush64 out of range: 2080200000(18), skip flush
[ 2579.871121] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2579.907861] L2CACHE: flush64 out of range: 2080200000(200), skip flush
[ 2579.958853] L2CACHE: flush64 out of range: 2080200000(40), skip flush
[ 2579.991985] sd 0:0:0:0: [sda] Optimal transfer size 33553920 bytes
[ 2580.050243] L2CACHE: flush64 out of range: 2080200000(200), skip flush
[ 2580.083613] L2CACHE: flush64 out of range: 2080200000(e00), skip flush
[ 2580.117032]  sda: sda1
[ 2580.160296] L2CACHE: flush64 out of range: 2080200000(20), skip flush
[ 2580.194052] L2CACHE: flush64 out of range: 2080200000(40), skip flush
[ 2580.227422] L2CACHE: flush64 out of range: 2080200000(40), skip flush
[ 2580.261259] L2CACHE: flush64 out of range: 2080200000(40), skip flush
[ 2580.294622] L2CACHE: flush64 out of range: 2080200000(40), skip flush
[ 2580.327987] L2CACHE: flush64 out of range: 2080200000(4), skip flush
[ 2580.360460] L2CACHE: flush64 out of range: 2080200000(24), skip flush
[ 2580.393307] L2CACHE: flush64 out of range: 2080200000(4), skip flush
[ 2580.425517] L2CACHE: flush64 out of range: 2080200000(200), skip flush
[ 2580.458472] L2CACHE: flush64 out of range: 2080200000(18), skip flush
[ 2580.491104] L2CACHE: flush64 out of range: 2080200000(40), skip flush
[ 2580.523291] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.555873] sd 0:0:0:0: [sda] Attached SCSI disk
[ 2580.562617] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.619062] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.651793] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.684545] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.717775] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.750457] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.783082] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.815730] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.848361] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.880629] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.912572] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.944350] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2580.975772] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.007099] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.038250] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.069311] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.100436] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.131311] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.162099] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.192679] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.223164] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.253527] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.283795] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.313884] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.344068] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.374279] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.404231] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.434017] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.464536] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.493989] L2CACHE: flush64 out of range: 2080200000(2000), skip flush
[ 2581.523263] L2CACHE: flush64 out of range: 2080200000(6000), skip flush
[ 2581.552563] L2CACHE: flush64 out of range: 2080200000(f000), skip flush
[ 2581.581935] L2CACHE: flush64 out of range: 2080200000(1f000), skip flush
[ 2581.614828] L2CACHE: flush64 out of range: 2080200000(3f000), skip flush
[ 2581.650022] L2CACHE: flush64 out of range: 2080200000(1a000), skip flush
[ 2581.679364] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.708604] L2CACHE: flush64 out of range: 2080200000(2000), skip flush
[ 2581.737738] L2CACHE: flush64 out of range: 2080200000(4000), skip flush
[ 2581.766907] L2CACHE: flush64 out of range: 2080200000(15000), skip flush
[ 2581.796019] L2CACHE: flush64 out of range: 2080200000(5000), skip flush
[ 2581.828052] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2581.856974] L2CACHE: flush64 out of range: 2080200000(a000), skip flush
[ 2581.886153] L2CACHE: flush64 out of range: 2080200000(17000), skip flush
[ 2581.915130] L2CACHE: flush64 out of range: 2080200000(10000), skip flush
[ 2581.944055] L2CACHE: flush64 out of range: 2080200000(9000), skip flush
[ 2582.000130] L2CACHE: flush64 out of range: 2080200000(200), skip flush
[ 2582.053287] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.083757] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.118637] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.146571] L2CACHE: flush64 out of range: 2080200000(200), skip flush
[ 2582.174924] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.203347] L2CACHE: flush64 out of range: 2080200000(200), skip flush
[ 2582.231596] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.259746] L2CACHE: flush64 out of range: 2080200000(200), skip flush
[ 2582.287323] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.316006] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.344557] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.372968] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.401811] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.430552] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.459423] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.488226] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.516836] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.545618] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.574224] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.602733] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.631223] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.660471] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.688617] L2CACHE: flush64 out of range: 2080200000(2000), skip flush
[ 2582.716613] L2CACHE: flush64 out of range: 2080200000(6000), skip flush
[ 2582.745178] L2CACHE: flush64 out of range: 2080200000(f000), skip flush
[ 2582.774220] L2CACHE: flush64 out of range: 2080200000(1f000), skip flush
[ 2582.807665] L2CACHE: flush64 out of range: 2080200000(3f000), skip flush
[ 2582.842533] L2CACHE: flush64 out of range: 2080200000(40000), skip flush
[ 2582.877036] L2CACHE: flush64 out of range: 2080200000(1f000), skip flush
[ 2582.906682] L2CACHE: flush64 out of range: 2080200000(10000), skip flush
[ 2582.935548] L2CACHE: flush64 out of range: 2080200000(d000), skip flush
[ 2582.967378] L2CACHE: flush64 out of range: 2080200000(1000), skip flush
[ 2582.997070] L2CACHE: flush64 out of range: 2080200000(1000), skip flush

[root@fedora-starfive ~]# 
xypron commented 3 years ago

This bug leads to fatal kernel crashes:

[  118.270973] stmmaceth 10020000.gmac eth0: PHY [stmmac-0:00] driver [Generic PHY] (irq=POLL)
[  118.309890] dwmac1000: Master AXI performs fixed burst length
[  118.341452] stmmaceth 10020000.gmac eth0: No Safety Features support found
[  118.374196] stmmaceth 10020000.gmac eth0: No MAC Management Counters available
[  118.407037] stmmaceth 10020000.gmac eth0: IEEE 1588-2008 Advanced Timestamp supported
[  118.450948] stmmaceth 10020000.gmac eth0: registered PTP clock
[  118.493427] stmmaceth 10020000.gmac eth0: configuring for phy/rgmii-txid link mode
[  118.929631] dm9601 1-1.1:1.0 eth1: link up, 100Mbps, full-duplex, lpa 0xFFFF
[  119.028528] L2CACHE: flush64 out of range: 8000000000000000(8), skip flush
[  119.061134] L2CACHE: flush64 out of range: 8000000000000000(8), skip flush
[  119.094827] L2CACHE: flush64 out of range: 8000000080000000(8), skip flush
[  119.127225] L2CACHE: flush64 out of range: 8000000080000000(8), skip flush
[  119.602284] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[  119.634025] L2CACHE: flush64 out of range: c000000080000000(8), skip flush
[  119.666339] L2CACHE: flush64 out of range: c000000080000000(8), skip flush
[ ***  ] (1 of 2) A start job is [  120.245271] Unable to handle kernel paging request at virtual address 0000005f81981894
[  120.245287] Oops [#1]
[  120.245292] Modules linked in: nfnetlink ebtable_filter rfkill ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sr9700 dm9601 usbnet ip_tables
[  120.245356] CPU: 0 PID: 196 Comm: plymouthd Tainted: G        W         5.10.6+ #26
[  120.245363] epc: ffffffdf81981894 ra : ffffffe00090bf54 sp : ffffffe085e533d0
[  120.245368]  gp : ffffffe0018416a8 tp : ffffffe084b40000 t0 : 0000000000000040
[  120.245374]  t1 : ffffffe1fed7d000 t2 : ffffffe000ea9d88 s0 : ffffffe085e53400
[  120.245380]  s1 : ffffffe085efc6c0 a0 : ffffffe085efc6c0 a1 : ffffffe001843228
[  120.245385]  a2 : ffffffe001843228 a3 : 0000000000000000 a4 : ffffffe08019fce0
[  120.245391]  a5 : ffffffdf81981894 a6 : 0000000000000000 a7 : 000000000000000c
[  120.245396]  s2 : ffffffe080706000 s3 : 0000000000000000 s4 : 0000000000000000
[  120.245401]  s5 : ffffffe0018ab6d8 s6 : ffffffe00004b480 s7 : ffffffe080706164
[  120.245407]  s8 : ffffffe00004b480 s9 : ffffffe00004b330 s10: ffffffe001844208
[  120.245412]  s11: 0000000000000010 t3 : 000000000000007f t4 : 0000000000000000
[  120.245417]  t5 : 00000001fecbe000 t6 : ffffffe0835ddeb0
[  120.245424] status: 0000000200000120 badaddr: 0000005f81981894 cause: 000000000000000c
[  120.245436] ---[ end trace 36a7b0e5940bba54 ]---
[  120.245445] Kernel panic - not syncing: Fatal exception in interrupt
[  120.245453] SMP: stopping secondary CPUs
[  120.881047] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
running for… Wait Online (1min 20s / no limit)
marckleinebudde commented 3 years ago

Hey @xypron,

I just found there is another v5.10 kernel repo at starfive-tech: https://github.com/starfive-tech/sft-riscv-linux-5.10, it has some more commits and at least one that might help with your issue, at least according to the patch description:

starfive-tech/sft-riscv-linux-5.10@e6aa768

pdp7 commented 3 years ago

@marckleinebudde Thanks for noting that.

Yes there are currently two StarFive Linux 5.10 repos:

Fu Wei (@tekkamanninja) is planning to bring these together into one common repo that both Freelight-U-SDK and Fedora can point to.

marckleinebudde commented 3 years ago

It looks like the https://github.com/starfive-tech/beagle_kernel_5.10 is older, can we delete it to avoid confusion?

xypron commented 3 years ago

https://github.com/starfive-tech/beagle_kernel_5.10/tree/Fedora has a commit that is only 12 hours old. It would make sense to make Fedora the main branch.

marckleinebudde commented 3 years ago

@xypron right! The last time I checked was before I went to bed :)

pdp7 commented 3 years ago

@marckleinebudde @xypron there is new fedora branches just pushed by @tekkamanninja

We agreed today in StarFive meeting that we need to have just one kernel repo and just one uboot repo that both https://github.com/starfive-tech/freelight-u-sdk/tree/starfive and https://github.com/starfive-tech/beaglev_fedora use. @tekkamanninja is working on this consolidation.

yimingyiming commented 3 years ago

@MichaelZhuxx This is a Fedora issue and please assign to tekkamanninja.

stffrdhrn commented 3 years ago

I understand this is only expected in the VIC7100. But, this seems pretty consistent happening with this address 2080200000.

On my boad plugging in a USB drive I also see something like the below.

[riscv@fedora-starfive ~]$ sudo vgs
L2CACHE: flush64 out of range: 2080200000(20000), skip flush
L2CACHE: flush64 out of range: 2080200000(20000), skip flush
  WARNING: PV /dev/sda1 in VG data2 is using an old PV header, modify the VG to update.
  VG    #PV #LV #SN Attr   VSize     VFree
  data2   1   1   0 wz--n- <1024.00g <774.00g

This seems related to this kernel patch: https://github.com/esmil/linux/commit/1ab9c4c0f8e3d3ad2199e6f43b11fc25d3080919

I am guessing we are somewhere calling flush where we shouldn't. Do we know which call to flush is actually causing the noise above with usb? It seems like the above patch to flush dcache for every write is to workaround the issue of memory writes not working correctly on the VIC7100.

Some links:

So I suspect if we fix up the dcache flush workaround the USB may be more stable. Should this issue be moved to linux instead of fedora?

pdp7 commented 3 years ago

Closing this issue in favor of https://github.com/starfive-tech/linux/issues/1