hexdump0815 / imagebuilder

velvet os - simple script framework to build ubuntu 22.04 lts jammy (in older versions also 20.04 lts focal) and debian 12 bookworm (in older versions also 11 bullseye) bootable usb / sd card images for some arm and intel devices - lots of prebuilt images as well
GNU General Public License v3.0
315 stars 46 forks source link

allwinner_h616: problem: x96q: H313: Boot Crash - Unable to handle kernel paging request at virtual address ffff8bffeffd0680 #69

Open benpoulson opened 2 years ago

benpoulson commented 2 years ago

Hey,

I have the 2GB RAM 16GB Storage model of the x96q; and it's currently failing on all builds you've published (220618-02, 211204-03 on focal, bullseye and jammy) so far for the h616 chipset.

Sometimes you can reach the desktop, sometimes you fail even before the uart console can present a login, but it always fails with the following error when put under any amount of load.

[ 5.740570] Unable to handle kernel paging request at virtual address ffff8bffeffd0680

I'm not sure if you previously only tested on the 4GB model, but I don't have access to one yet; but I'm assuming this issue will probably be down to my 2GB model.

I've tried modifying and recompiling the memory settings in the DTBs found in both uboot and the kernel with some memory values I found in the official sun50iw9p1 DTBs to attempt to get around the issue, but with no luck. I'm not skilled enough to track down the root of the issue.

Hopefully you can shine some light on this issue!

Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done. 
[    5.740570] Unable to handle kernel paging request at virtual address ffff8bffeffd0680 

[    5.749916] Mem abort info: 
[    5.752783]   ESR = 0x96000004 
[    5.755885]   EC = 0x25: DABT (current EL), IL = 32 bits 
[    5.761268]   SET = 0, FnV = 0 
[    5.764371]   EA = 0, S1PTW = 0 
[    5.767555]   FSC = 0x04: level 0 translation fault 
[    5.772487] Data abort info: 
[    5.775403]   ISV = 0, ISS = 0x00000004 
[    5.779282]   CM = 0, WnR = 0 
[    5.782280] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041409000 
[    5.789049] [ffff8bffeffd0680] pgd=0000000000000000, p4d=0000000000000000 
[    5.795914] Internal error: Oops: 96000004 [#1] PREEMPT SMP 
[    5.801539] Modules linked in: dw_hdmi_cec sun8i_drm_hdmi dw_hdmi cec dwmac_sun8i stmmac_platform stmmac pcs_xpcs phylink sun4i_drm sun4i_frontend sun4i_tcon sun8i_tcon_top panfrost gpu_sched sun8i_mixer drm_shmem_helper drm_cma_helper drm_kms_helper drm 
[    5.824289] CPU: 1 PID: 248 Comm: btrfs Not tainted 5.18.1-stb-616+ #1 
[    5.830878] Hardware name: Shenzhen Amediatech Technology Co., Ltd X96Q TV Box (DT) 
[    5.838593] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) 
[    5.845619] pc : update_io_ticks+0x74/0xbc 
[    5.849767] lr : __blk_account_io_start+0x44/0x90 
[    5.854520] sp : ffff800009d5b710 
[    5.857864] x29: ffff800009d5b710 x28: ffff800009d5bd48 x27: 0000000000000000 
[    5.865070] x26: ffff000005838b60 x25: 0000000000000001 x24: 0000000000000001 
[    5.872274] x23: ffff0000098f0000 x22: 0000000000000001 x21: ffff800009d5b9d0 
[    5.879478] x20: ffff000001fd4b00 x19: ffff000003355880 x18: 0000000000000002 
[    5.886682] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 
[    5.893886] x14: 0000000000000001 x13: 0000000000075cb0 x12: 0000000000000000 
[    5.901090] x11: ffff800008589a54 x10: ffff0000058389e0 x9 : 0000000000020000 
[    5.908294] x8 : 0000000000000000 x7 : ffff000005838018 x6 : 0000000000000000 
[    5.915498] x5 : 0000000000000001 x4 : 00000000fffee0a1 x3 : ffff000005838000 
[    5.922701] x2 : 0000000000000001 x1 : ffff8000768d1000 x0 : 00000bff796ff680 
[    5.929906] Call trace: 
[    5.932376]  update_io_ticks+0x74/0xbc 
[    5.936164]  blk_mq_submit_bio+0x1cc/0x5b0 
[    5.940303]  __submit_bio+0xf0/0x160 
[    5.943914]  submit_bio_noacct_nocheck+0x1e4/0x230 
[    5.948752]  submit_bio_noacct+0x168/0x324 
[    5.956737]  submit_bio+0x44/0xf0 
[    5.963866]  mpage_readahead+0x134/0x170 
[    5.971581]  blkdev_readahead+0x18/0x24 
[    5.979173]  read_pages+0x80/0x240 
[    5.986302]  page_cache_ra_unbounded+0x154/0x1b4 
[    5.994624]  force_page_cache_ra+0xcc/0x100 
[    6.002475]  page_cache_sync_ra+0x44/0x100 
[    6.010204]  filemap_get_pages+0xac/0x620 
[    6.017812]  filemap_read+0xb0/0x320 
[    6.024936]  blkdev_read_iter+0xc4/0x200 
[    6.032372]  new_sync_read+0xd4/0x150 
[    6.039508]  vfs_read+0x190/0x1dc 
[    6.046251]  ksys_read+0x68/0xf4 
[    6.052863]  __arm64_sys_read+0x20/0x2c 
[    6.060048]  invoke_syscall+0x48/0x114 
[    6.067111]  el0_svc_common.constprop.0+0x44/0xec 
[    6.075107]  do_el0_svc+0x24/0x84 
[    6.081668]  el0_svc+0x2c/0x84 
[    6.087928]  el0t_64_sync_handler+0x1a4/0x1b0 
[    6.095459]  el0t_64_sync+0x18c/0x190 
[    6.102254] Code: d538d081 710000df 91020000 9a851042 (f8616807)  
[    6.111476] ---[ end trace 0000000000000000 ]--- 
[    6.119229] note: btrfs[248] exited with preempt_count 1 
[    6.127652] ------------[ cut here ]------------ 
[    6.135330] WARNING: CPU: 1 PID: 248 at kernel/exit.c:742 do_exit+0x718/0x89c 
[    6.145514] Modules linked in: dw_hdmi_cec sun8i_drm_hdmi dw_hdmi cec dwmac_sun8i stmmac_platform stmmac pcs_xpcs phylink sun4i_drm sun4i_frontend sun4i_tcon sun8i_tcon_top panfrost gpu_sched sun8i_mixer drm_shmem_helper drm_cma_helper drm_kms_helper drm 
[    6.174533] CPU: 1 PID: 248 Comm: btrfs Tainted: G      D           5.18.1-stb-616+ #1 
[    6.185739] Hardware name: Shenzhen Amediatech Technology Co., Ltd X96Q TV Box (DT) 
[    6.196688] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) 
[    6.206931] pc : do_exit+0x718/0x89c 
[    6.213755] lr : make_task_dead+0x5c/0xf4 
[    6.221022] sp : ffff800009d5b350 
[    6.227573] x29: ffff800009d5b350 x28: ffff800009d5b473 x27: ffff800009085bc0 
[    6.237987] x26: 0000000000000001 x25: ffff80000858ffac x24: 0000000000000000 
[    6.248421] x23: 0000000000000000 x22: 000000000000000b x21: ffff800009089ff8 
[    6.258850] x20: 000000000000000b x19: ffff000003355880 x18: 0000000000000001 
[    6.269276] x17: 0000000000000004 x16: 0000000000000000 x15: 0000000000000000 
[    6.279698] x14: ffff8000094845b0 x13: 00000000000004c8 x12: 0000000000000198 
[    6.290123] x11: 00000000ffffffff x10: ffff8000094dc5b0 x9 : 00000000fffff000 
[    6.300534] x8 : ffff8000094845b0 x7 : ffff8000094dc5b0 x6 : 0000000000000000 
[    6.310928] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000 
[    6.321299] x2 : 0000000000000000 x1 : ffff000003355880 x0 : ffff800009d5b9d0 
[    6.331519] Call trace: 
[    6.336869]  do_exit+0x718/0x89c 
[    6.342974]  make_task_dead+0x5c/0xf4 
[    6.349475]  die+0x1f4/0x230 
[    6.355154]  die_kernel_fault+0x384/0x394 
[    6.361942]  __do_kernel_fault+0xfc/0x180 
[    6.368724]  do_translation_fault+0x58/0xc0 
[    6.375664]  do_mem_abort+0x44/0x94 
[    6.381888]  el1_abort+0x40/0x6c 
[    6.387852]  el1h_64_sync_handler+0xb0/0xd0 
[    6.394780]  el1h_64_sync+0x64/0x68 
[    6.400991]  update_io_ticks+0x74/0xbc 
[    6.407463]  blk_mq_submit_bio+0x1cc/0x5b0 
[    6.414298]  __submit_bio+0xf0/0x160 
[    6.420601]  submit_bio_noacct_nocheck+0x1e4/0x230 
[    6.428139]  submit_bio_noacct+0x168/0x324 
[    6.434967]  submit_bio+0x44/0xf0 
[    6.440996]  mpage_readahead+0x134/0x170 
[    6.447608]  blkdev_readahead+0x18/0x24 
[    6.454133]  read_pages+0x80/0x240 
[    6.460221]  page_cache_ra_unbounded+0x154/0x1b4 
[    6.467536]  force_page_cache_ra+0xcc/0x100 
[    6.474415]  page_cache_sync_ra+0x44/0x100 
[    6.481212]  filemap_get_pages+0xac/0x620 
[    6.487928]  filemap_read+0xb0/0x320 
[    6.494201]  blkdev_read_iter+0xc4/0x200 
[    6.500827]  new_sync_read+0xd4/0x150 
[    6.507188]  vfs_read+0x190/0x1dc 
[    6.513192]  ksys_read+0x68/0xf4 
[    6.519097]  __arm64_sys_read+0x20/0x2c 
[    6.525607]  invoke_syscall+0x48/0x114 
[    6.532029]  el0_svc_common.constprop.0+0x44/0xec 
[    6.539427]  do_el0_svc+0x24/0x84 
[    6.545431]  el0_svc+0x2c/0x84 
[    6.551167]  el0t_64_sync_handler+0x1a4/0x1b0 
[    6.558220]  el0t_64_sync+0x18c/0x190 
[    6.564573] ---[ end trace 0000000000000000 ]--- 
[   35.814489] cldo1: disabling 

Here's the full console dump: x86q_error.log

Lastly, I'd like to say thanks for the brilliant work you've put into this project. It's infinitely useful.

benpoulson commented 2 years ago

Setting the mem env to 1024M in uboot does stop the issue. Going to slowly increment back towards 2048M to see where the issue starts again.

hexdump0815 commented 2 years ago

@benpoulson - this is one of the things i planned to suggest to you (besides disabling the higher frequency opp points in the dtb): i guess the box simply just has 1gb of ram - android tv boxes are quite known for fake specs and it goes so far even that the fake specs are shown in android and sometimes even the chip labels are modified to pretend a better chip on the board ... this unpredictable quality and unreliable specs of tv boxes is the reason why i nearly gave up on them (nowadays they are even as expensive as regular sbc's so one reason less to use tv boxes instead of those) - there were even boxes with a different soc in it than printed on the box ... and specs can be way off: i have one 4/32gb box which was 1/8gb in the end and you can even buy multiple boxes at the same time from the same seller and get some with different specs and boards :)

benpoulson commented 2 years ago

Unsure if you've done anything more than disable the hardware acceleration within the xorg environment; but I can confirm the WIP bookworm debian install runs FLAWLESSLY on the x96q and doesn't run into any of the above memory issues. (can address all 2GB of RAM)

I'm even able to modify the device tree to clock up to 1.60ghz without any reliability problems. Been running a cpu+memory stresstest for the last week without any fault.

hexdump0815 commented 2 years ago

@benpoulson - this is very good news - what exactly did you use for this? an imagebulder built bookworm image with v5.18.1 kernel or the latest bullseye image updated to bookworm or something else?

benpoulson commented 2 years ago

@hexdump0815 - I used the bookworm image built by imagebuilder. Completely standard setup, no other changes (except the device tree CPU clock changes)

allwinner_h616_release_version="5.18.1-stb-616%2B"
allwinner_h616_uboot_version="211126-01"
mesa_release_version="22.1.1"

Zero issues at all. I just finished the stress test and it's still completely stable.

I've not checked the repo in the last few days, but I'm on commit:

commit ff682b751971786043c9186177eb28276011394b (HEAD -> main, origin/main, origin/HEAD)
Author: hexdump <hexdump0815@googlemail.com>
Date:   Tue Aug 30 22:14:28 2022 +0200
hexdump0815 commented 2 years ago

@benpoulson - once more thanks a lot for this info ... i think there were no other changes - i'm meanwhile updating my bullseye systems more and more to bookworm as it simply runs much better and is meanwhile way more up to date and still useable stable ... btw. the imagebuilder puts the git commit hash it was built from in /etc/imagebuilder-info - just in case you were not aware of it yet

best wishes - hexdump