Open ncopa opened 5 months ago
Please keep the commit below:
Revert "riscv: Fix wrong usage of lm_alias() when splitting a huge linear mapping" Revert "riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXEC" Revert "riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings"
Do you mean we should re-apply the reverted commits?
Should we also disable HIGHMEM in that case?
only need one .
so, there are two solutions: 1) apply the previous three reverted commits 2) or disable CONFIG_HIGHMEM
Disabling HIGHMEM was no success:
[ 0.000000] Linux version 6.1.89-0-sophgo (buildozer@build-edge-riscv64) (gcc (Alpine 13.2.1_git20240309) 13.2.1 20240309, GNU ld (GNU Binutils) 2.42) #1-Alpine SMP PREEMPT Thu, 02 May 2024 07:45:04 +0000
[ 0.000000] OF: fdt: Ignoring memory range 0x0 - 0x2200000
[ 0.000000] Machine model: Sophgo Mango
[ 0.000000] earlycon: uart0 at MMIO32 0x0000007040000000 (options '')
[ 0.000000] printk: bootconsole [uart0] enabled
[ 0.000000] Hide vector 0.7 extension
[ 0.000000] efi: UEFI not found.
[ 0.000000] Unable to handle kernel paging request at virtual address fffffff73ff3afa0
[ 0.000000] Oops [#1]
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.89-0-sophgo #1-Alpine
[ 0.000000] Hardware name: Sophgo Mango (DT)
[ 0.000000] epc : __memset+0xb4/0xfc
[ 0.000000] ra : memblock_alloc_try_nid+0x74/0x84
[ 0.000000] epc : ffffffff806eac48 ra : ffffffff808132c2 sp : ffffffff81003e60
[ 0.000000] gp : ffffffff8114d348 tp : ffffffff81012dc0 t0 : fffffff73ff3aef8
[ 0.000000] t1 : 0000001f42183000 t2 : 49464555203a6966 s0 : ffffffff81003ea0
[ 0.000000] s1 : 000000000004805c a0 : fffffff73ff3afa0 a1 : 0000000000000000
[ 0.000000] a2 : 000000000004805c a3 : fffffff73ff82ff8 a4 : 0000000000000054
[ 0.000000] a5 : ffffffff806eac48 a6 : 000000000004805c a7 : 0000000000000080
[ 0.000000] s2 : fffffff73ff3afa0 s3 : ffffffffffffffff s4 : ffffffc6fecfb000
[ 0.000000] s5 : ffffffff80828cc8 s6 : 0000000000000000 s7 : 0000000000000000
[ 0.000000] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
[ 0.000000] s11: 0000000000000000 t3 : ffffffff80a09458 t4 : ffffffff80a09458
[ 0.000000] t5 : ffffffff80a09458 t6 : ffffffff80a09470
[ 0.000000] status: 0000000200000100 badaddr: fffffff73ff3afa0 cause: 000000000000000f
[ 0.000000] [<ffffffff806eac48>] __memset+0xb4/0xfc
[ 0.000000] [<ffffffff80828ce6>] early_init_dt_alloc_memory_arch+0x1e/0x48
[ 0.000000] [<ffffffff80525ab0>] __unflatten_device_tree+0x52/0x114
[ 0.000000] [<ffffffff80829e9c>] unflatten_device_tree+0x2c/0x44
[ 0.000000] [<ffffffff80803966>] setup_arch+0xd4/0x580
[ 0.000000] [<ffffffff80800970>] start_kernel+0x96/0xa90
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
(I also merged in v6.1.89)
Disabling HIGHMEM was no success:
[ 0.000000] Linux version 6.1.89-0-sophgo (buildozer@build-edge-riscv64) (gcc (Alpine 13.2.1_git20240309) 13.2.1 20240309, GNU ld (GNU Binutils) 2.42) #1-Alpine SMP PREEMPT Thu, 02 May 2024 07:45:04 +0000 [ 0.000000] OF: fdt: Ignoring memory range 0x0 - 0x2200000 [ 0.000000] Machine model: Sophgo Mango [ 0.000000] earlycon: uart0 at MMIO32 0x0000007040000000 (options '') [ 0.000000] printk: bootconsole [uart0] enabled [ 0.000000] Hide vector 0.7 extension [ 0.000000] efi: UEFI not found. [ 0.000000] Unable to handle kernel paging request at virtual address fffffff73ff3afa0 [ 0.000000] Oops [#1] [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.89-0-sophgo #1-Alpine [ 0.000000] Hardware name: Sophgo Mango (DT) [ 0.000000] epc : __memset+0xb4/0xfc [ 0.000000] ra : memblock_alloc_try_nid+0x74/0x84 [ 0.000000] epc : ffffffff806eac48 ra : ffffffff808132c2 sp : ffffffff81003e60 [ 0.000000] gp : ffffffff8114d348 tp : ffffffff81012dc0 t0 : fffffff73ff3aef8 [ 0.000000] t1 : 0000001f42183000 t2 : 49464555203a6966 s0 : ffffffff81003ea0 [ 0.000000] s1 : 000000000004805c a0 : fffffff73ff3afa0 a1 : 0000000000000000 [ 0.000000] a2 : 000000000004805c a3 : fffffff73ff82ff8 a4 : 0000000000000054 [ 0.000000] a5 : ffffffff806eac48 a6 : 000000000004805c a7 : 0000000000000080 [ 0.000000] s2 : fffffff73ff3afa0 s3 : ffffffffffffffff s4 : ffffffc6fecfb000 [ 0.000000] s5 : ffffffff80828cc8 s6 : 0000000000000000 s7 : 0000000000000000 [ 0.000000] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000 [ 0.000000] s11: 0000000000000000 t3 : ffffffff80a09458 t4 : ffffffff80a09458 [ 0.000000] t5 : ffffffff80a09458 t6 : ffffffff80a09470 [ 0.000000] status: 0000000200000100 badaddr: fffffff73ff3afa0 cause: 000000000000000f [ 0.000000] [<ffffffff806eac48>] __memset+0xb4/0xfc [ 0.000000] [<ffffffff80828ce6>] early_init_dt_alloc_memory_arch+0x1e/0x48 [ 0.000000] [<ffffffff80525ab0>] __unflatten_device_tree+0x52/0x114 [ 0.000000] [<ffffffff80829e9c>] unflatten_device_tree+0x2c/0x44 [ 0.000000] [<ffffffff80803966>] setup_arch+0xd4/0x580 [ 0.000000] [<ffffffff80800970>] start_kernel+0x96/0xa90 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
(I also merged in v6.1.89)
To fix this error, you may reduce physical memory to less than 124G that is max physical address space of direct mapping of all physical memory in SV39. (https://www.kernel.org/doc/html/next/riscv/vm-layout.html)
The cause is below: 1) memblock.current_limit is MEMBLOCK_ALLOC_ANYWHERE when disable CONFIG_HIGHMEM . 2) phys_to_virt() called from memblock_alloc_internal() returns a wrong virtual address becasue alloc variable' value is above 124G.
I re-applied the 3 patches, re-enabled HIGHMEM, merged in v6.1.90. But it failed with:
Hi @xingxg2022 we are still having issues with this. We have 2 hosts on which one hosts is seeing this issue.
It would be really nice if we could get this second box running again as its our main build server for rv64 arch for alpine linux.
I re-applied the 3 patches, re-enabled HIGHMEM, merged in v6.1.90. But it failed with:
Please keep our commits below:
1)Revert "riscv: Fix wrong usage of lm_alias() when splitting a huge linear mapping" 2)Revert "riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXEC" 3)Revert "riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings"
The above reverts can fix the issue.
The panic log shows that your kernel code still has three patch, "riscv: Fix wrong usage of lm_alias() when splitting a huge linear mapping" "riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXEC" "riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings"
Built from commit 83ab3eda46e651464f2715455ae66711882be116
We are not able to ssh to the machine. lsmod hangs.