neilbrown / gnubee-tools

Tools for building gnubee firmware - and maybe more.
Other
31 stars 14 forks source link

kernel panic trying to boot 5.7 #25

Open xaiki opened 3 years ago

xaiki commented 3 years ago

I've tried building 5.7, and it panics on boot:

spi     - spi command
 Reset MT7530t image via network using TFTP protocol
set LAN/WAN WLLLLe data by tftp protocol
(Re)start USB...b-system
USB0:   mtk-xhci: init hccr be1c0000 and hcor be1c0020 hc_length 32
Register 300010f NbrPorts 3sion
Starting the controller
USB XHCI 0.96
scanning bus 0 for devices... 2 USB Device(s) found
       scanning bus for storage devices... 0 Storage Device(s) found
ethaddr="00:AA:BB:CC:DD:10"
 No USB Storage found. Upgrade FW failed!
serverip=192.168.1.2
Please choose the operation:
   1: Load system code to SDRAM via TFTP.
   2: Load system code then write to Flash via TFTP.
   3: Boot system code via Flash (default).
   4: Enter boot command line interface.
   5: Load system code then write to Flash via USB Storage.
   6: Load system code then write to Flash via Httpd.
   9: Load U-Boot code then write to Flash via TFTP.

You chose 3
                                                                                                                      0

3: System Boot system code via Flash.
## Checking image at bc050000 ...
   Image Name:   Linux-5.7.2+
   Image Type:   MIPS Linux Kernel Image (uncompressed)
   Data Size:    21383376 Bytes = 20.4 MB
   Load Address: 80001000
   Entry Point:  806a1140
   Verifying Checksum ... OK
OK
No initrd
## Transferring control to Linux (at address 806a1140) ...
## Giving linux memsize in MB, 512

Starting kernel ...

[    0.000000] Linux version 5.7.2+ (xaiki@sucre) (gcc version 9.3.0 (Debian 9.3.0-8), GNU ld (GNU Binutils for Debian) 2.35) #6 SMP Mon Oct 26 09:11:16 -03 2020
[    0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[    0.000000] printk: bootconsole [early0] enabled
[    0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
[    0.000000] MIPS: machine is GB-PC2
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] VPE topology {2,2} total 4
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000001bffffff]
[    0.000000]   HighMem  [mem 0x000000001c000000-0x0000000023ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001bffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x0000000023ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000023ffffff]
[    0.000000] percpu: Embedded 14 pages/cpu s26832 r8192 d22320 u57344
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 130176
[    0.000000] Kernel command line: console=ttyS0,57600 rootfstype=squashfs,jffs2
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] Writing ErrCtl register=00020002
[    0.000000] Readback ErrCtl register=00020002
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 496808K/524288K available (6815K kernel code, 244K rwdata, 1492K rodata, 13388K init, 238K bss, 27480K reserved, 0K cma-reserved, 65536K highmem)
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] NR_IRQS: 256
[    0.000000] random: get_random_bytes called from start_kernel+0x394/0x594 with crng_init=0
[    0.000000] CPU Clock: 900MHz
[    0.000000] clocksource: GIC: mask: 0xffffffffffffffff max_cycles: 0xcf914c9718, max_idle_ns: 440795231327 ns
[    0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 4247245437 ns
[    0.000010] sched_clock: 32 bits at 450MHz, resolution 2ns, wraps every 4772186110ns
[    0.015504] Calibrating delay loop... 597.60 BogoMIPS (lpj=2988032)
[    0.087786] pid_max: default: 32768 minimum: 301
[    0.097126] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.111535] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.129685] rcu: Hierarchical SRCU implementation.
[    0.142105] smp: Bringing up secondary CPUs ...
[    0.152780] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.152792] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.152806] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.152891] CPU1 revision is: 0001992f (MIPS 1004Kc)
[    0.211339] Synchronize counters for CPU 1: done.
[    0.282034] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.282043] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.282053] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.282107] CPU2 revision is: 0001992f (MIPS 1004Kc)
[    0.331989] Synchronize counters for CPU 2: done.
[    0.393144] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.393153] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.393162] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.393215] CPU3 revision is: 0001992f (MIPS 1004Kc)
[    0.451643] Synchronize counters for CPU 3: done.
[    0.511262] smp: Brought up 1 node, 4 CPUs
[    0.520139] devtmpfs: initialized
[    0.530528] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.550011] futex hash table entries: 1024 (order: 3, 32768 bytes, linear)
[    0.563618] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 80092f28, ra == 80092f18
[    0.584696] Oops[#1]:
[    0.589161] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.2+ #6
[    0.600904] $ 0   : 00000000 00000001 00000000 9bc48000
[    0.611271] $ 4   : 00000000 00000000 00000000 00000000
[    0.621639] $ 8   : fffffff2 9bc51e28 9bd60000 00000400
[    0.632006] $12   : 9bc51b8c 00000043 815a0000 0000000f
[    0.642374] $16   : 80830000 00000001 80830370 80830000
[    0.652743] $20   : 81570000 80886d04 8085d094 00000008
[    0.663110] $24   : 00000001 00000000
[    0.673478] $28   : 9bc50000 9bc51de0 80886d24 80092f18
[    0.683847] Hi    : 00000000
[    0.689549] Lo    : 00000000
[    0.695283] epc   : 80092f28 cmpxchg_futex_value_locked+0x28/0x64
[    0.707354] ra    : 80092f18 cmpxchg_futex_value_locked+0x18/0x64
[    0.719444] Status: 11008403 KERNEL EXL IE
[    0.727739] Cause : 40800008 (ExcCode 02)
[    0.735687] BadVA : 00000000
[    0.741390] PrId  : 0001992f (MIPS 1004Kc)
[    0.749510] Modules linked in:
[    0.755564] Process swapper/0 (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
[    0.771629] Stack : 81570000 80886d04 8085d094 00000008 80886d24 80867c60 ffbc1930 0043e6cf
[    0.788217]         80830000 80838470 00000000 9bc51e24 00000000 00000400 00000400 8085d094
[    0.804806]         00000008 0000000a 80886d04 df56f1a8 ffffffff 80867bac ffffffff 80830000
[    0.821394]         00000000 80001580 00000000 9bc51e4c 9bc51e4c df56f1a8 808238a0 00000000
[    0.837983]         80830000 8085d000 8081ea08 00000001 00000001 00000000 80830000 80780000
[    0.854572]         ...
[    0.859413] Call Trace:
[    0.864258] [<80092f28>] cmpxchg_futex_value_locked+0x28/0x64
[    0.875684] [<80867c60>] futex_init+0xb4/0x128
[    0.884475] [<80001580>] do_one_initcall+0x8c/0x1c4
[    0.894151] [<8085df54>] kernel_init_freeable+0x22c/0x264
[    0.904883] [<806a131c>] kernel_init+0x14/0xfc
[    0.913682] [<800068d8>] ret_from_kernel_thread+0x14/0x1c
[    0.924391] Code: 2408fff2  00001025  0000000f <c0a30000> 14660005  00000000  00e00825  e0a10000  1020fff9
[    0.943737]
[    0.946712] ---[ end trace 6d6af4ce27cfef83 ]---
[    0.955841] Kernel panic - not syncing: Fatal exception
[    0.966239] Rebooting in 1 seconds..
[    3.629481] Reboot failed -- System halted
neilbrown commented 3 years ago

Thanks for trying this out and reporting the results! I had a look at the code in kernel/futex.c and the only place that futex_init() calls cmpxchg_futex_value_locked() is in futex_detect_cmpxchg(). It is passed a NULL, and the error seems to be a null dereference, but the NULL is clearly intentional and has been there for a long time. I would probably try to git-bisect and find the patch which causes it to stop working. It can be a slow laborious process.

xaiki commented 3 years ago

i've tried your 5.6 branch but it got me to the same point (i remember succesfully runing it before) so I looked a bit more into it, the NULL call to cmpxchg_futex_value_locked is:

        pagefault_disable();
    ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
    pagefault_enable();

I couldn't find where pagefault_disable() is defined.

the pagefault (that is intentional) should not happen, the comentay says:

    /*
     * This will fail and we want it. Some arch implementations do
     * runtime detection of the futex_atomic_cmpxchg_inatomic()
     * functionality. We want to know that before we call in any
     * of the complex code paths. Also we want to prevent
     * registration of robust lists in that case. NULL is
     * guaranteed to fault and we get -EFAULT on functional
     * implementation, the non-functional ones will return
     * -ENOSYS.
     */

i've hacked that call to always return futex_cmpxchg_enabled = 0; but 5.9.1 hanged later in the boot, i'm now retrying with your 5.7.2

xaiki commented 3 years ago

same happens with 5.7.2, with my hack it boots, but hangs before passing me to the busybox shell.

neilbrown commented 3 years ago

There is a report of similar problem on http://groups.google.com/group/gnubee/t/b21f65a820e43b62 which was resolved by using a different version of the compiler. I'm using gcc-7.2.0 and bin-utils 2.29.1.20170915 without problems. I can build a boot 5.10.1 without this crash. What versions are you using?