polarfire-soc / meta-polarfire-soc-yocto-bsp

PolarFire SoC yocto Board Support Package
Other
48 stars 36 forks source link

page allocation failure for PCI client driver(at10k/ath9k): order:0, mode:0xcc4(GFP_KERNEL|GFP_DMA32), nodemask=(null) #46

Closed govindsi closed 1 year ago

govindsi commented 1 year ago

Baseline: 22/11 Board: polarfire icicle kit + QCA9880 wifi card

ISSUE: DMA page allocation failures seen during bootup as CMA is OOM.

4.795291] swapper/0: page allocation failure: order:0, mode:0xcc4(GFP_KERNEL|GFP_DMA32), nodemask=(null)

[ 4.805112] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.68-linux4microchip+fpga-2022.09 #1 [ 4.813684] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) [ 4.819886] Call Trace: [ 4.822355] [] dump_backtrace+0x1c/0x24 [ 4.827801] [] show_stack+0x2c/0x38 [ 4.832896] [] dump_stack_lvl+0x40/0x58 [ 4.838339] [] dump_stack+0x14/0x1c [ 4.843424] [] warn_alloc+0xc6/0x138 [ 4.848617] [] alloc_pages_slowpath.constprop.0+0x688/0x8a8 [ 4.855974] [] alloc_pages+0x11e/0x170 [ 4.861495] [] __dma_direct_alloc_pages.constprop.0+0x14e/0x280 [ 4.869030] [] dma_direct_alloc+0x40/0x13e [ 4.874725] [] dma_alloc_attrs+0x70/0x7e [ 4.880244] [] ath10k_ce_alloc_src_ring+0x86/0x172 [ 4.886643] [] ath10k_ce_alloc_pipe+0x9e/0x142 [ 4.892684] [] ath10k_pci_alloc_pipes+0x88/0xe8 [ 4.898821] [] ath10k_pci_setup_resource+0x126/0x1ba [ 4.905392] [] ath10k_pci_probe+0x148/0x80c [ 4.911172] [] pci_device_probe+0x7e/0xce [ 4.916797] [] really_probe.part.0+0x5c/0x224 [ 4.922768] [] driver_probe_device+0x98/0xbe [ 4.928811] [] driver_probe_device+0x2e/0xf6 [ 4.934676] [] driver_attach+0x58/0x154 [ 4.940284] [] bus_for_each_dev+0x4a/0x84 [ 4.945891] [] driver_attach+0x1a/0x22 [ 4.951239] [] bus_add_driver+0xd8/0x192 [ 4.956759] [] driver_register+0x48/0xd8 [ 4.962279] [] __pci_register_driver+0x58/0x62 [ 4.968323] [] ath10k_pci_init+0x22/0x40 [ 4.973841] [] do_one_initcall+0x36/0x15a [ 4.979450] [] kernel_init_freeable+0x1a6/0x20a [ 4.985586] [] kernel_init+0x1e/0x104 [ 4.990846] [] ret_from_exception+0x0/0xc [ 4.996940] Mem-Info: [ 4.999262] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:0 dirty:0 writeback:0 slab_reclaimable:206 slab_unreclaimable:1327 mapped:0 shmem:0 pagetables:1 bounce:0 kernel_misc_reclaimable:0 free:466380 free_pcp:1279 free_cma:0

govindsi commented 1 year ago

Is there any change already raised/available that increases the reserved memory region or CMA size. More boot-up logs:

[ 0.000000] OF: fdt: Ignoring memory range 0x1000000000 - 0x1000200000 [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') [ 0.000000] printk: bootconsole [sbi0] enabled [ 0.000000] efi: UEFI not found. [ 0.000000] Reserved memory: created DMA memory pool at 0x0000000080000000, size 32 MiB [ 0.000000] OF: reserved mem: initialized node buffer@80000000, compatible id shared-dma-pool [ 0.000000] Reserved memory: created DMA memory pool at 0x00000000c0000000, size 128 MiB [ 0.000000] OF: reserved mem: initialized node buffer@c0000000, compatible id shared-dma-pool [ 0.000000] Reserved memory: created DMA memory pool at 0x00000000d0000000, size 128 MiB [ 0.000000] OF: reserved mem: initialized node buffer@d0000000, compatible id shared-dma-pool [ 0.000000] Zone ranges: [ 0.000000] DMA32 empty [ 0.000000] Normal [mem 0x0000001000200000-0x0000001075ffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000001000200000-0x0000001075ffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000001000200000-0x0000001075ffffff] [ 0.000000] SBI specification v0.3 detected [ 0.000000] SBI implementation ID=0x1 Version=0x10000 [ 0.000000] SBI TIME extension detected [ 0.000000] SBI IPI extension detected [ 0.000000] SBI RFENCE extension detected [ 0.000000] SBI SRST extension detected [ 0.000000] SBI v0.2 HSM extension detected [ 0.000000] CPU with hartid=0 is not available [ 0.000000] CPU with hartid=0 is not available [ 0.000000] riscv: ISA extensions acdfim [ 0.000000] riscv: ELF capabilities acdfim [ 0.000000] percpu: Embedded 17 pages/cpu s30440 r8192 d31000 u69632 [ 0.000000] pcpu-alloc: s30440 r8192 d31000 u69632 alloc=17*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [ 0.000000] CPU node for /cpus/cpu@0 exist but the possible cpu range is :0-3 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 476215 [ 0.000000] Kernel command line: earlycon=sbi root=/dev/mmcblk0p3 rootwait uio_pdrv_genirq.of_id=generic-uio [ 0.000000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear) [ 0.000000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] software IO TLB: Cannot allocate buffer [ 0.000000] Memory: 1880884K/1931264K available (6786K kernel code, 4872K rwdata, 4096K rodata, 2133K init, 301K bss, 50380K reserved, 0K cma-reserved)

lhanlyu commented 1 year ago

Can you confirm the version/release you were using? Can you please retest with the 2023.02? The reference design will also have to be updated.

govindsi commented 1 year ago

Thanks, its fixed with 2023.02. I was looking for the risc-v pmu counters[https://lore.kernel.org/lkml/mhng-d6e408db-0c83-4358-9cef-831a694d582e@palmer-ri-x1c9/T/] to profile the cache miss and TLB misses, while running the up link and down link traffic as CPU utilization is getting very high at lower throughput . Is there plan to support this in newer release or is there any way profile those parameter.